alldigital/PSA

Fork 0

Hermes 284313f908

Bidi Control Character Guard / bidi-control-guard (push) Waiting to run

Details

Circular Dependency Check / Check for new circular dependencies (push) Waiting to run

Details

Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run

Details

E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run

Details

ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run

Details

Integration Tests / Check for relevant changes (push) Waiting to run

Details

Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions

Details

Mobile checks / Mobile lint + typecheck (push) Waiting to run

Details

Mobile checks / Mobile unit tests (push) Waiting to run

Details

Mobile checks / Mobile dependency audit (report) (push) Waiting to run

Details

Mobile checks / Mobile reproducibility checks (push) Waiting to run

Details

Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run

Details

Temporal Readiness / fast-readiness (push) Waiting to run

Details

Temporal Readiness / docker-parity (push) Waiting to run

Details

TypeScript Type Check / Nx affected typecheck (push) Waiting to run

Details

Unit Tests / Skipped-test budget (push) Waiting to run

Details

Unit Tests / Nx affected unit tests (push) Waiting to run

Details

Unit Tests / Server unit coverage (informational) (push) Waiting to run

Details

Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run

Details

Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions

Details

EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run

Details

Initial import of AlgaPSA codebase from PSA server

Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech

2026-06-22 16:12:17 -05:00

56 KiB

Raw Blame History

Client Extension Multi-Tenancy Overhaul Plan

Last updated: 2025-08-09

Status update (2025-11-21):

v2 extension system is live with out-of-process Runner + signed content-addressed bundles; legacy in-process/dynamic import path removed (see extension-system-v2-migration.md).
UI delivery now uses Runner ext-ui host with iframe sandbox; gateway proxies all API calls to Runner /v1/execute.
Remaining multi-tenant hardening tracks to the alignment plan (install_id propagation, RBAC, manifest enforcement).

Context & Findings

Current behavior: user-supplied extension code is uploaded into the running application environment and dynamically loaded. This violates multi-tenant isolation and increases operational risk (code execution in app context, shared process memory, filesystem access, and unrestricted egress).
Repo state: Community Edition (CE) contains stubs; Enterprise Edition (EE) code is present under ee/server. The CE app dynamically imports EE initialization (ee/server/src/lib/extensions/initialize) when enterprise mode is enabled.
Risk summary:
- Cross-tenant impact via shared process or host resources.
- In-process arbitrary code execution elevates the blast radius to the entire cluster.
- Unbounded capabilities: filesystem, network, and secrets likely not capability-scoped.
- Weak provenance: uploaded files lack signed, reproducible artifacts and verified dependency graphs.

Goals

Strong tenant isolation for compute, storage, cache, and network.
No direct execution of tenant-supplied code in the application process.
Capability-based, least-privilege runtime with explicit allowlists.
Deterministic, reproducible, and signed extension artifacts.
Auditable execution with traceability, quotas, and rate limits per tenant.
Backwards-compatible migration path, with clear deprecation of unsafe paths.

Overarching Phases

Phase 1 — Static Rendering via Rust Host (MinIO proxy)

Scope: Serve prebuilt UI bundles (iframe apps) as immutable static assets via a Rust host that proxies reads from MinIO/S3, with strict path sanitation, tenant/contentHash validation, ETag/Cache-Control, and pod-local caching optional.
Purpose: Quickly replace any dynamic module loading in the app with safe, static delivery. No guest code execution. Focus on asset integrity and isolation.
Deliverables:
- Rust static asset service (MinIO/S3 proxy) with SPA fallback and CSP guidance for iframes
- URL model: /ext-ui/{extensionId}/{content_hash}/... mapped to object storage layout (sha256//ui/...)
- Basic registry/install wiring to resolve content_hash per tenant (read-only for UI)
- Signing/hash verification for assets at fetch time (optional signature; hash required)
- Docs + Client SDK usage for iframe embedding

Phase 2 — Dynamic WASM Features

Scope: Out-of-process Runner (Rust + Wasmtime), Host API v1 (capability-based), Next.js API gateway to Runner, event-driven execution, quotas/limits, and per-tenant auditability.
Purpose: Safely execute extension logic outside the app process with strong isolation and provenance.
Deliverables:
- Runner service with Wasmtime limits, host imports, and signature verification
- Registry + bundle signing/publishing, versioning, and warmup/prefetch
- API gateway for /api/ext/... to invoke handlers in Runner
- Event subscriptions, logs/metrics, idempotency, and quota enforcement

Mapping to detailed sections

Phase 1 aligns with: "Client UI Delivery (iframe-only)", "Client Asset Serving via Gateway", and parts of "Bundle Storage Integration" focused on static ui assets and integrity.
Phase 2 aligns with: "Runner Service Design", "HTTP Routing for Plugin Endpoints", "Next.js API Router/Proxy", "Runtime Decision: Wasmtime", and remaining bundle signing/execute paths.

Non-Goals (for this overhaul)

Supporting all languages. Start with JS/TS to WASM or isolate; consider additional languages later.
Full “bring-your-own container” marketplace. We will support a controlled out-of-process path, but not arbitrary images at first.

Upfront Decisions (Simplifications)

EE-only: Extensions ship only with Enterprise Edition; no feature flag toggle needed in CE. Remove extension initialization paths in non-EE builds.
Runtime: Standardize on Wasmtime-based wasm_runner only; no alternate runtimes.
Storage: Use S3-compatible storage via our existing S3StorageProvider against local MinIO only. No alternative providers. Canonical bucket and prefix are defined via env.
UI: Iframe-only Client SDK approach. React-based example and docs only for SDK; no descriptor renderer.
Fetch/serve model: Object storage is source of truth. Pods fetch bundles/UI on-demand into a pod-local cache and serve directly via Next.js/Knative.
Framework: Use Axum 0.7 + tower-http for the unified Rust application server. Static asset routes (/ext-ui/...) and execute routes (/v1/execute) live in the same binary. This keeps Phase 1 minimal and allows Wasmtime to be bolted in for Phase 2 without changing frameworks. See ee/runner/src/http/server.rs and dependency updates in ee/runner/Cargo.toml.

Executive Summary

We are splitting the extension overhaul into two phases: Phase 1 focuses on safe, static UI delivery via a Rust host proxying MinIO/S3 (no dynamic module loading, no guest code execution), and Phase 2 delivers dynamic WASM execution with a Rust Runner (Wasmtime), a capability-based Host API, and a Next.js API gateway. This preserves security and isolation while enabling a clear migration path.

Server Actions-First Contract

Principle: Business logic lives in server actions under server/src/lib/actions (EE overlays may live under ee/server/src/lib/actions). HTTP API routes exist only as thin wrappers that call these actions to support external/infra consumers (Runner, automation).
Actions (conceptual names) and wrappers:
- extensions.publishVersion(bundle) → verifies, computes content_hash, writes to sha256/<hash>/bundle.tar.zst, records extension_bundle. Wrapper: POST /api/extensions/:id/versions.
- installs.createOrEnable(tenant, extension, version) → persists install, computes runner_domain, sets runner_status='pending', enqueues provisioning workflow. Wrapper: POST /api/installs or server-initiated only.
- installs.lookupByHost(host) → returns { tenant_id, extension_id, content_hash }. Wrapper: GET /api/installs/lookup-by-host (used by Runner).
- installs.validate(tenant, extension, hash) → returns { valid: boolean }. Wrapper: GET /api/installs/validate (used by Runner ext-ui gate).
- installs.reprovision(installId) → retries provisioning (Temporal). Wrapper: POST /api/installs/:id/reprovision.
Testing guidance: unit/integration tests target server actions; API tests cover parameter parsing and delegation only.

Proposed Document Map

Unified service approach

We will deploy a single Rust application server that serves both static assets (/ext-ui/...) and the execute API (/v1/execute). CDN fronts /ext-ui with immutable caching by contentHash. Route-level isolation and config separation keep static and execute concerns safe within one binary.
Phase 1 — Static Rendering via Rust Host (MinIO proxy)
- See: Phase 1 section below. Consolidates: "Client UI Delivery (iframe-only)", "Client Asset Serving via Gateway", and the UI-asset portions of "Distributed Bundles, Assets, and Caching".
Phase 2 — Dynamic WASM Features
- See: Phase 2 section below. Consolidates: "Runner Service Design (Rust + Wasmtime)", "HTTP Routing for Plugin Endpoints", "Next.js API Router/Proxy", "Runtime Decision: Wasmtime", and WASM/precompiled portions of caching.
Shared Foundations
- See: Data Model and Registry section. Consolidates: "Data Model (initial)" and "Public APIs (EE)".

Phase 1 — Static Rendering via Rust Host (MinIO proxy)

Scope & Objectives

Serve prebuilt iframe UI bundles as immutable static assets from MinIO/S3 via a Rust host. Validate tenant/contentHash; sanitize paths; set strong caching and security headers. No dynamic JS import into host app.

Architecture

Implementation: Served by the unified Rust application server within a dedicated route group (/ext-ui/...)
URL model: /ext-ui/{extensionId}/{contentHash}/[...path]
Object storage layout: sha256//ui/**/* (extracted from bundle) or tar subtree on first touch; integrity via contentHash
Caching: CDN as primary (immutable by contentHash); pod-local cache optional/minimal for origin efficiency; SPA fallback to index.html

Security

Tenant/contentHash validation with registry lookups
Path sanitization, file size caps, immutable caching, ETag/If-None-Match
CSP for iframes (summary; full guidance in Appendix A)

Deployment & Operations

Env: EXT_BUNDLE_STORE_URL, STORAGE_S3_, EXT_CACHE_, EXT_STATIC_STRICT_VALIDATION; health checks; metrics; autoscaling profile
CDN: front /ext-ui with long-lived immutable caching keyed by full path; origin shielding to reduce S3 reads

Test Plan

Unit/integration for sanitization, 404/304/200 paths, cache eviction, large file handling; load tests for warm/cold cache; S3 failure modes

References to detailed content in this doc

Client UI Delivery (iframe-only with SDK)
Client Asset Serving via Gateway (pod-local cache)
Distributed Bundles, Assets, and Caching (UI aspects)

Phase 1 — TODOs (Status)

1.a Client Asset Fetch-and-Serve (Pod-Local Cache)

Route: server/src/app/ext-ui/[extensionId]/[contentHash]/[...path]/route.ts (GET).
Cache manager: server/src/lib/extensions/assets/cache.ts (ensure and basic index write).
Static serve: server/src/lib/extensions/assets/serve.ts (SPA fallback; sanitize; caching headers).
Mime map: server/src/lib/extensions/assets/mime.ts.
Details
- Tar/zip extraction for ui/**/*.
- LRU index file structure recorded; [x] eviction policy and GC.
- ETag generation and conditional GET support.
- Locking/concurrency control for first-touch extraction.
- Enforce tenant/contentHash match (404 on mismatch) in route handler.
- CSP guidance for iframe pages.

1.b Client SDK (Iframe)

Packages created: ee/server/packages/extension-iframe-sdk/, ee/server/packages/ui-kit/.
SDK files
- src/index.ts, [x] src/bridge.ts, [x] src/auth.ts, [x] src/navigation.ts, [x] src/theme.ts, [x] src/types.ts, [x] React hooks (src/hooks.ts), [x] README with React example and security guidance.
UI Kit
- src/index.ts, [x] theme tokens CSS and theming entry, [x] MVP components, [x] hooks, [x] README (tokens + usage updated).
Example app
- Vite + TS example (under ee/server/packages/extension-iframe-sdk/examples/vite-react/) with README and static build output.
Host bridge bootstrap
- ee/server/src/lib/extensions/ui/iframeBridge.ts to inject theme tokens and session.
Protocol & security
- Origin validation and sandbox attributes; author docs.
- Message types include version.
Ergonomics
- React hooks: useBridge, useTheme, useAuthToken, useResize.

1.c Bundle Storage Integration (UI integrity)

Details
- Hash verification on fetch and before use.
  - Archive integrity: archive sha256 is verified against the URL content-address (sha256//bundle.tar.zst) during download. On mismatch, the request returns 502 (code: archive_hash_mismatch) and nothing is cached.
  - Per-file integrity: on every GET, a strong ETag is computed from the served file bytes using SHA-256 and returned as a quoted value: "sha256-". If the client supplies If-None-Match with this exact value, the server returns 304.
  - Operational note: URLs include the contentHash making CDN caching safe and immutable; origin fails closed on integrity mismatches and never serves partially extracted assets.

1.d Unified Rust Static Asset Host (MinIO/S3 proxy)

Routing
- Add GET route group in ee/runner/src/http/server.rs: /ext-ui/{extensionId}/{contentHash}/*path
- Implement SPA fallback: serve index.html when file missing or path is a directory; honor ?path=/... for client router hydration
- Strict path sanitation: reject .., absolute paths, and illegal chars; normalize and ensure access remains within cache root
Framework and dependencies
- Framework: continue with Axum 0.7; add tower-http layers/services to simplify static hosting
- Use tower_http::services::ServeDir for on-disk cache under ${EXT_CACHE_ROOT}/{hash}/ui/; wrap with a custom handler for tenant/contentHash validation and SPA fallback
- Add mime_guess for content-type mapping
- Keep reqwest S3-compatible HTTP via BUNDLE_STORE_BASE; optionally switch to aws-sdk-s3 if Range/HEAD origin features are required
- Update ee/runner/Cargo.toml with:
  - tower-http = "0.5" features ["fs","compression","set-header","trace"]
  - mime_guess = "2"
  - tar = "0.4" and zstd = "0.13" (or async-compression with zstd feature)
  - optional aws-sdk-s3 = { version = "1", features = ["rustls"] }
Registry/contentHash validation
- Add lightweight registry validation client (HTTP or DB per deployment) to confirm tenant install → version → content_hash before serving
- On mismatch or missing install/version, return 404 and never serve from cache
- Short TTL (30–60s) cache for registry lookups keyed by {tenant_id, extension_id, content_hash}
Object storage integration
- Extend ee/runner/src/engine/loader.rs with fetch_object_range() and fetch_to_file() helpers for large reads
- Fetch bundle archive and extract only ui/**/* into cache on first touch
- Enforce layout sha256/<hash>/ui/**/* and verify sha256 during extract (per-file or archive-level validation)
Pod-local cache
- Introduce ee/runner/src/cache/fs.rs with helpers to:
  - compute cache paths under ${EXT_CACHE_ROOT}/<hash>/ui/...
  - write files atomically (temp + rename)
  - set read-only permissions after write
- [-] Implement capacity-based LRU eviction (bytes and/or file-count) reusing ee/runner/src/cache/lru.rs -- DELAY
- [-] Background GC task and on-demand eviction on put; record cache index with last-access timestamps -- DELAY
Headers and correctness
- Content-Type mapping by extension (fallback application/octet-stream)
- Cache-Control: public, max-age=31536000, immutable (URLs are content-hash addressed)
- ETag generation from file content; support If-None-Match → 304
- Optional range requests: Accept-Ranges, 206 Content-Range for large assets - DELAY
- File size caps and response size caps; return 413/416 as appropriate
Security
- Enforce tenant/contentHash validation before any serve; never trust URL alone
- Disallow directory traversal and hidden files; consider allowlist of extensions (html, js, css, json, map, svg, png, jpg, webp, woff, woff2)
- CSP guidance for iframe pages; document default CSP and sandbox attributes
Configuration and ops
- Env: BUNDLE_STORE_BASE, STORAGE_S3_*, EXT_CACHE_ROOT, EXT_CACHE_MAX_BYTES, EXT_STATIC_STRICT_VALIDATION, EXT_STATIC_MAX_FILE_BYTES
- Enhance /healthz in ee/runner/src/http/server.rs to check cache dir writable and object store reachable (HEAD on bucket/prefix)
- /warmup supports prefetch of {contentHash} UI subtree into cache
- Structured tracing fields on serve: request_id, tenant, extension, content_hash, file_path, status, duration_ms, cache_status (hit/miss)
Tests
- Unit: path sanitizer; content-type mapper; ETag calc; cache LRU; extract-only-UI correctness
- Integration: cold fetch → extract → 200; repeat with If-None-Match → 304; tenant/contentHash mismatch → 404; large file → 413; traversal attempts → 400/404
Docs
- Update Client SDK README to reference iframe src="/ext-ui/{extensionId}/{content_hash}/index.html?path=/..." and CSP/sandbox guidance

1.e Bundle Format Alignment (zstd)

Rationale
- Uploader/finalizer and authoring tooling standardize on bundle.tar.zst (zstd-compressed tar).
- Runner must align on the same artifact name and compression to avoid format mismatches.
Tasks
- Runner: change bundle URL to sha256/<hex>/bundle.tar.zst in ee/runner/src/engine/loader.rs::bundle_url() and any hard-coded paths.
- Runner: replace gzip decoding with zstd decoding in ee/runner/src/http/ext_ui.rs (use zstd::stream::read::Decoder or async-compression zstd reader) for UI extraction.
- Runner: update temporary file naming in verify_archive_sha256() to .tar.zst for clarity (no functional change required).
- Tests: update ee/runner/tests/ext_ui_integration.rs to generate .tar.zst bundles and serve /sha256/:hex/bundle.tar.zst in the in-memory server.
- Cargo: add zstd = "^0.13" (or enable zstd in async-compression) and remove the flate2 dependency if no longer needed.
- Docs: ensure all references in this plan and related docs use bundle.tar.zst consistently.

1.f Per-Extension App Domains (Knative)

Rationale
- Assign a dedicated app domain per tenant’s extension install so Knative can autoscale the Runner on host hits and we have clean, predictable URLs.
- Keep a single Runner KService; provision a DomainMapping per extension install that targets that KService.
Data model
- Add columns to tenant_extension_install:
  - runner_domain (text, unique, indexed)
  - runner_status (jsonb; { state: 'pending'|'provisioning'|'ready'|'error', message?, last_updated? })
  - runner_ref (jsonb; optional: KService/DomainMapping identifiers for troubleshooting)
- Config: EXT_DOMAIN_ROOT (e.g., ext.example.com) and domain pattern <t8>--<e8>.<EXT_DOMAIN_ROOT> where:
  - t8 = first 8 hex chars if tenantId is UUID-like, else first 12 slug chars
  - e8 = first 8 hex chars if extensionId is UUID-like, else first 12 slug chars
  - Rationale: ensures DomainMapping metadata.name stays within 63-char limit.
Provisioning (Option B: Temporal worker)
- Create provisioning workflow in Temporal (ee/temporal-workflows/src/worker.ts task queue):
  - Activity: computeDomain(tenantId, extensionId, EXT_DOMAIN_ROOT) returns domain string.
  - Activity: ensureDomainMapping({ domain, kservice, namespace }) uses Kubernetes API to create DomainMapping:
    - apiVersion: serving.knative.dev/v1beta1, kind: DomainMapping, metadata.name: <domain>
    - spec.ref: { apiVersion: 'serving.knative.dev/v1', kind: 'Service', name: <runner-kservice> }
  - Update DB status: set runner_status.state to provisioned or error with message.
- Trigger workflow on install.
- Trigger workflow on enable.
- Expose a “reprovision domain” action to retry.
- RBAC/secret: ServiceAccount with permission to manage DomainMappings in the Runner namespace.
Server (Next.js)
- Server actions-first:
  - installs.createOrEnable(...) computes runner_domain, persists runner_status='pending', enqueues Temporal provisioning.
  - installs.lookupByHost(host) → { tenant_id, extension_id, content_hash } (resolves latest bundle by domain).
  - installs.validate(tenant, extension, hash) → { valid: boolean } (strict ext-ui gating).
- Expose thin API wrappers that delegate to actions:
  - GET /api/installs/lookup-by-host?host=...
  - GET /api/installs/validate?tenant=...&extension=...&hash=...
  - POST /api/installs/:id/reprovision (calls installs.reprovision).
Runner changes
- GET / host entry: read Host header, call REGISTRY_BASE_URL/api/installs/lookup-by-host?host=... (with short TTL cache), 302 → /ext-ui/{extensionId}/{content_hash}/index.html.
- Keep ext-ui strict validation as-is (host lookup is just a dispatcher).
UI updates
- Extensions list/details: display runner_domain, status (pending/provisioned/error), copy/open links.
- Add action to reprovision if status=error.
Ops
- Wildcard DNS *.${EXT_DOMAIN_ROOT} → Knative ingress (or automate DNS records per domain).
- KService env/secrets documented: BUNDLE_STORE_BASE, REGISTRY_BASE_URL, EXT_CACHE_MAX_BYTES, EXT_STATIC_STRICT_VALIDATION, EXT_EGRESS_ALLOWLIST, S3 creds. See ee/docs/extension-system/knative-app-domains.md.
Failure modes & handling
- On provisioning failure: persist error in runner_status, surface in UI, provide retry.
- On lookup miss: Runner returns 404.
- Audit install-to-domain mapping (log/metrics on lookup miss).

Install Provisioning — State Diagram

stateDiagram-v2
    [*] --> Pending: Install created/enabled
    Pending --> Provisioning: Enqueue Temporal workflow\nensureDomainMapping
    Provisioning --> Ready: DomainMapping applied\nupdate runner_status=ready
    Provisioning --> Error: Provisioning failure\nupdate runner_status=error
    Error --> Provisioning: Reprovision action\nretry workflow
    Ready --> Ready: New version published\ncontent_hash updates via lookup
    Ready --> Provisioning: Reprovision action
    note right of Ready: Host traffic → Runner\nGET / → lookup-by-host → 302 /ext-ui/.../index.html

Phase 2 — Dynamic WASM Features

Implementation note

Phase 2 routes (/v1/execute) are served by the same unified Rust application server. The Wasmtime engine, egress allowlists, and secrets are only wired into the execute route group; static routes remain read-only and do not mount runner secrets.

Scope & Objectives

Out-of-process execution with Rust Runner (Wasmtime), capability-based Host API, Next.js API gateway, events, quotas, provenance (signed bundles).

Architecture

Runner Service Design (Rust + Wasmtime)
HTTP Routing for Plugin Endpoints and API gateway
Runtime Decision: Wasmtime (WASM-only)
Distributed Bundles and Caching (WASM/precompiled aspects)

Security & Isolation

Resource limits, egress allowlists, secrets brokering, audit logs, idempotency

Deployment & Operations

Knative Serving profile, autoscaling, warmup/precompile

Test Plan

Execute API behavior, policy enforcement, quotas, error codes, telemetry

References to detailed content in this doc

Runner Service Design (Rust + Wasmtime)
HTTP Routing for Plugin Endpoints
Next.js API Router/Proxy (design)

Phase 2 — TODOs (Status)

2.a Database Schema and Registry Services

Migrations (EE): create base tables
- extension_registry
- extension_version
- extension_bundle (includes precompiled map)
- tenant_extension_install
- extension_event_subscription
- extension_execution_log
- extension_quota_usage
- RLS plan and enforcement for tenant-scoped tables
Registry service scaffold (ee/server/src/lib/extensions/registry-v2.ts).
Tenant install service scaffold (ee/server/src/lib/extensions/install-v2.ts).
Signature verification util (stub) in server/src/lib/extensions/signing.ts.
Admin CLI for publish/deprecate/install flows.
Details
- PK/FK relationships and cascade deletes confirmed in migrations.
- Indexes: execution_log (tenant_id, created_at), event_subscription (tenant_id, topic), tenant_install (tenant_id).
- Consider extension_id normalization vs. registry_id lookups.

2.b Bundle Storage Integration (signing and precompiled)

EE S3 provider implemented against MinIO (scaffold).
CE bundle helpers added in server/src/lib/extensions/bundles.ts (placeholders for EE wiring).
Precompiled cwasm support in schema (DB) and manifest; [ ] runtime selection logic in loader.
Details
- Canonical content-address layout documented.
- Signature format decision and trust bundle format.
- Signature verification: runner mandatory; gateway optional.

2.c Runner Service (Rust + Wasmtime)

Runner crate scaffolding: Cargo.toml, src/main.rs, src/http/server.rs (POST /v1/execute), src/models.rs.
Engine/loader/cache modules created (placeholders).
Wasmtime configuration
- Engine/Config: async enabled, epoch_interruption on
- PoolingAllocationConfig with conservative caps
- Static/dynamic guard sizes; static max size set
- Store limits: custom ResourceLimiter and Store.limiter installed
- Timeouts: epoch-based deadline mapped from timeout_ms with background engine.increment_epoch
- Fuel: optional fuel metering toggle and budgeting (currently disabled)
Host imports (alga.*)
- Logging
  - alga.log_info(ptr,len)
  - alga.log_error(ptr,len)
- HTTP
  - alga.http.fetch(req_ptr,req_len,out_ptr) async via reqwest
  - EXT_EGRESS_ALLOWLIST enforcement (exact/subdomain host match)
  - Limits/policy: size/time caps; header allowlist; method/body policy
- Storage (KV/doc)
  - alga.storage.* (API design + stubs)
- Secrets
  - alga.secrets.get (API design + stubs)
- Metrics/observability
  - alga.metrics.* (counters/timers) or host-collected hooks
Module fetch/cache from S3
- Source
  - Fetch via BUNDLE_STORE_BASE + content-addressed key
- Caching
  - In-memory per-process cache (HashMap)
  - Pod-local LRU with capacity limits (disk/mem)
- Integrity
  - SHA-256 verification against key path (sha256//…)
  - Signature verification using SIGNING_TRUST_BUNDLE (deferred)
- Precompiled
  - Precompiled module fetch/use (optional), keyed by hash+target
Execute flow
- Input handling
  - Normalize ExecuteRequest → guest input JSON (context + http)
  - Idempotency cache (in-memory) based on x-idempotency-key
  - Additional validation of method/path/header/body limits
- Instantiate
  - Engine/Store with limits + linker imports
- ABI call
  - Require guest exports: memory, alloc, handler(req_ptr, req_len, out_ptr)
  - Optional dealloc support
  - Read resp tuple (ptr,len) → bytes
- Response
  - Parse as normalized response JSON {status, headers, body_b64}
  - Fallback: if not JSON, base64 opaque bytes
- Logging/metrics
  - Start/end logging with request_id, tenant, extension, status
  - duration_ms, resp_b64_len, configured timeout/mem
  - Counters/histograms (egress bytes, status code buckets), per-tenant metrics
  - Structured error codes mapping
Errors/tests: standardized error codes + unit/integration tests.
Containerization: ee/runner/Dockerfile and KService YAML with /healthz and /warmup.
Details
- Observability: tracing fields and metrics; persist execution logs.
- Idempotency handling with gateway-provided key.

2.d Next.js API Gateway for Server-Side Handlers

Route added: server/src/app/api/ext/[extensionId]/[...path]/route.ts (GET/POST/PUT/PATCH/DELETE).
Helpers: auth.ts, registry.ts, endpoints.ts, headers.ts (scaffolds).
Request policy
- Header allowlist (strip authorization).
- Body size caps.
- Timeout via EXT_GATEWAY_TIMEOUT_MS.
Proxy and telemetry
- Proxy to Runner /v1/execute with normalized payload.
- Map response back to client.
- Emit telemetry (tracing/metrics).
Details
- AuthN/Z: derive tenant from session/API key; enforce RBAC. (Scaffolding present in server/src/lib/extensions/gateway/auth.ts; production wiring pending.)
- Idempotency key for non-GET; [ ] retry policy (502/503/504 with jitter).
- Propagate x-request-id; record correlation IDs.
- Normalize user-agent.
- Resolve version_id → content_hash via extension_bundle join in gateway helpers (registry.ts).

2.e Knative Serving (Runner)

KService manifest with autoscaling annotations.
/healthz and /warmup endpoints implemented.
CI/CD step to build/publish runner and smoke-test /v1/execute.
Details
- Autoscale tuning; resource requests/limits aligned to memory caps.
- Warmup prefetch strategy for hot bundles.
- Rollout notes for revision updates.
Runtime Decision: Wasmtime (WASM-only)

Data Model and Registry (Shared Foundations)

Consolidates: Data Model (initial) and Public APIs (EE)
Used by Phase 1 for read-only UI delivery (install → version → content_hash)
Used by Phase 2 for full execution, logging, and quotas

Proposed Architecture

WASM-only runner model:

Out-of-Process Runner (single runtime path)

Execute all extensions in an external Runner Service using a WASM runtime with a strict, capability-based Host API.
No direct filesystem access; no raw network access. All I/O occurs through brokered host functions that enforce tenant- and capability-scoped policies.
Deterministic execution with configurable timeouts, memory limits, and concurrency controls per tenant/extension.

Signed, Reproducible Bundles

Extensions are packaged as immutable bundles (content-addressed by SHA256) with a manifest and lockfile.
Build pipeline compiles/transpiles and freezes dependencies; no dynamic require/import at runtime.
Bundles stored in object storage (e.g., S3/GCS) and verified by signature on install and on load.

Capability-Based Host API (stable, versioned)

Minimal surface: events, HTTP fetch via broker, key-value/doc store, scheduled tasks, secrets, and logging/metrics.
Explicit grants recorded per tenant install (manifest + admin approvals). All calls carry tenant_id and extension_id.
Timeouts, memory/cpu quotas, and concurrency limits enforced by the runner.

Event-Driven Execution

Core app publishes events (domain, data changes, schedules) to an event bus.
Registry maps tenant subscriptions to installed extension entrypoints.
Runner pulls events, resolves bundle, executes handler in isolated sandbox, and reports result/metrics.

UI Extension Sandboxing

UI integrates exclusively via sandboxed iframes powered by the Alga Extension Client SDK.
Enforce strict CSP, postMessage bridge, and explicit allowlists for APIs and assets.
UI assets are served from signed bundles or CDN; no runtime code injection into the host app.

Components

Extension Registry: catalogs extensions, versions, capabilities, and maintainers.
Tenant Install Store: per-tenant install with granted capabilities, secrets, and config.
Bundle Storage: object storage for signed, content-addressed bundles.
Build Service: validates, compiles, and signs bundles (CI-integrated and/or hosted).
Runner Service: isolated execution engine with quotas, metrics, and audit logs (implemented with Wasmtime).
Host API Broker: mediates storage, network egress, secrets, and queues; enforces policy.
Event Bus: routes events and schedules executions.
UI Host: renders UI extensions using sandbox constraints.

Distributed Bundles, Assets, and Caching (multi-pod safe)

Object storage as source of truth: All extension bundles and UI assets live in object storage using content-addressed paths (sha256/<hash>). No persistent host volumes across pods.
Pod-local caches: Runner and API pods maintain small ephemeral LRU caches on local disk/memory. On first request for a given content_hash, the pod pulls only the needed artifacts (WASM and/or ui/**/*) into its local cache.
Optional prefetch: On pod startup or install/upgrade events, selectively prefetch hot bundles/UI to reduce first-request latency.
No app-managed CDN or signed URLs: Assets are served directly from the pod over Knative Serving once cached locally.
Precompiled module cache: Store optional precompiled Wasmtime artifacts in object storage; pods fetch on demand and keep an ephemeral cache per target triple. Validate hash on use.
GC policy: Capacity-based eviction (e.g., max N GB or file count) with background GC to remove least-recently-used artifacts.
Consistency & integrity: Content-hash directory layout ensures deterministic assets. Verify signatures for bundles before use; verify file hashes when extracting.

Runner Service Design (Rust + Wasmtime)

Embedding: Rust service embedding Wasmtime with PoolingAllocator; Store limits configured for memory/tables.
Invocation API: Internal gRPC/HTTP accepting tenant_id, extension_id, version_id, content_hash, entry, input, and idempotency key. Runner fetches module artifacts, verifies signature, instantiates, and executes.
Host imports (capabilities): Namespaced imports alga.* for storage, http, secrets, events, logging. All calls scope to tenant/extension and enforce quotas and egress policy. No preopened FS; no ambient WASI.
Resource controls: Per-invocation memory caps, epoch timeouts, optional fuel metering; concurrency throttles per tenant/extension. Hard stop on policy violations with structured errors.
Event integration: Pull from event bus/queue with per-tenant partitions; support push-based execution for admin test-runs.
Observability: Structured logs with correlation IDs, metrics (duration, mem, fuel, egress), and tracing.
Failure handling: Retries via idempotency; quarantine misbehaving extensions; circuit breakers for upstream/broker failures.

Client UI Delivery (iframe-only with SDK)

Iframe-only UI: Extensions ship prebuilt static apps (e.g., React/Vite build). On first request, the API pod pulls the ui/**/* subtree for the installed content_hash into a pod-local cache and serves assets directly.
Client SDK: Provide @alga/ui-kit and @alga/extension-iframe-sdk for consistent components, theming, a11y, and a postMessage bridge (auth, navigation, theme tokens, telemetry, viewport sizing).
Theming: Host propagates design tokens to the iframe via the bridge; UI Kit consumes CSS variables for live theme updates.
Security: Sandbox iframes (allow-scripts by default; add allow-same-origin only if needed by SDK). All API calls go through /api/ext/... gateway. Prevent directory traversal in asset serving.

Client Asset Serving via Gateway (pod-local cache)

Entry route: server/src/app/ext-ui/[extensionId]/[contentHash]/[...path]/route.ts (GET)
- Resolves tenant install → content_hash (the URL’s [contentHash] must match; otherwise 404) to avoid serving stale assets.
- Ensures ui/**/* for [contentHash] exists in the pod-local cache directory, otherwise pulls and extracts just the ui subtree from the bundle archive.
- Serves files from <CACHE_ROOT>/<contentHash>/ui/ with SPA fallback to index.html when path is missing or not found.
- Sets headers: Cache-Control: public, max-age=31536000, immutable because contentHash makes URLs immutable; adds ETag based on file hash; sets content-type by extension.
Iframe src: Host pages set iframe src="/ext-ui/{extensionId}/{content_hash}/index.html?path=/desired/route".
Safety: Sanitize path, disallow .. segments, and restrict to the cached directory. Limit individual file size and total cache size.

Knative Serving Profile (initial)

Serving only (no Eventing initially). The unified Rust application server ships as a Knative Service (KService) to leverage revisioning and concurrency-based autoscaling. It exposes both /ext-ui (static) and /v1/execute (execute) routes.
Autoscaling metric: concurrency. Configure containerConcurrency (e.g., 4–16 depending on per-invocation memory) and use the Knative Pod Autoscaler (KPA) with a simple target concurrency (e.g., 10) as a starting point. Final SLOs/policies to be tuned later.
Scale policy: keep minScale configurable (0 for non-critical, 1+ for production to reduce cold starts). Set maxScale to cap cost. Revisions roll out code safely; extension versions are handled at the bundle layer, not via Knative revisions. Prefer CDN to absorb /ext-ui traffic so autoscaling is driven by execute workloads.
Probes and warmup: add a warmup endpoint to prefetch common bundles and initialize Wasmtime; use readiness probes that succeed only after caches are primed if needed.
Security: run under a restricted ServiceAccount with egress policies; use Kubernetes secrets for broker credentials and object store credentials. Static routes do not require runner secrets; ensure secret mounts are scoped to execute path usage.

Example KService (abridged):

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: alga-ext-runner
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/metric: concurrency
        autoscaling.knative.dev/target: "10"
        # Optional, tune later
        autoscaling.knative.dev/minScale: "1"
        autoscaling.knative.dev/maxScale: "50"
    spec:
      containerConcurrency: 8
      containers:
        - image: ghcr.io/alga/runner:sha-<image>
          env:
            - name: BUNDLE_STORE_BASE
              value: https://s3.example.com/alga-ext/
            - name: SIGNING_TRUST_BUNDLE
              valueFrom:
                secretKeyRef: { name: runner-secrets, key: trust.pem }
            - name: RUNTIME_LIMITS
              value: '{"memory_mb":512,"timeout_ms":5000,"fuel":null}'
          ports:
            - containerPort: 8080

On-Demand Loading, Versioning, and Hot Swap

Lazy load: Resolve the tenant’s installed extension version on each request; fetch the bundle by content_hash from object storage if not cached; verify signature; instantiate per-invocation.
Caching: Maintain in-pod LRU caches for raw WASM and precompiled artifacts keyed by content_hash+target. Validate hashes on every use. Optionally cache resolved handler maps per extension version.
Version updates: Tenant install updates change the version_id → content_hash mapping in the registry. Subsequent requests pick up the new content_hash automatically (cache miss → fetch new). In-flight requests continue on the old version; no pod restarts required.
Warmup: On install/upgrade, optionally push a warmup signal to prefetch and precompile hot bundles on a subset of Runner pods.
Consistency: Use strong consistency on registry lookups or include content_hash in the gateway’s dispatch token so the Runner executes the intended version even amid concurrent upgrades.

HTTP Routing for Plugin Endpoints

Gateway pattern: The core app exposes stable API paths and forwards plugin requests to the Runner. Proposed pattern: /api/ext/{extensionId}/{...path} with tenant context inferred from auth/session.
Manifest mapping: Manifest v2 defines API endpoints (method, path template, handler). The gateway resolves {extensionId, method, path} to a handler name within the bundle and calls Runner Execute with the request payload and headers.
AuthZ and quotas: The gateway enforces user authN/RBAC and per-tenant rate limits before invoking Runner. The Runner still enforces capability-level checks and per-tenant execution quotas.
Contract: Runner HTTP execute endpoint accepts method, path, query, headers, and body plus context (tenant_id, extension_id, content_hash), returning status, headers, and body. Inside WASM, the handler receives a normalized request object and returns a normalized response.

Next.js API Router/Proxy (design)

Route structure: server/src/app/api/ext/[extensionId]/[...path]/route.ts
Methods: Support GET, POST, PUT, PATCH, DELETE. All methods follow the same pipeline.
Env/config: RUNNER_BASE_URL, BUNDLE_STORE_BASE, SIGNING_TRUST_BUNDLE, EXT_GATEWAY_TIMEOUT_MS.

Request pipeline (per request):

Resolve tenant: derive tenant_id from session/auth; attach to context and rate-limit bucket.
Resolve install/version: query registry for tenant’s install of extensionId; get version_id and content_hash.
Resolve endpoint: load manifest for that version (from registry/bundle manifest cache) and match {method, path} against api.endpoints (support path params). If not found, return 404.
Build Execute call: construct a request for Runner with context and normalized HTTP payload. Generate an idempotency key for non-GET from request_id || hash(method+url+body).
Forward to Runner: call POST {RUNNER_BASE_URL}/v1/execute with a short-lived service token. Propagate an allowlist of headers (e.g., x-request-id, accept, content-type) and strip end-user authorization.
Timeout & retries: apply EXT_GATEWAY_TIMEOUT_MS (default 5s). Retries only on 502/503/504 with jitter and idempotency for safe methods.
Return response: map Runner’s {status, headers, body} to NextResponse. Enforce response header allowlist and size limits.

Execute API (Runner)

Request JSON (abridged):

{
  "context": {
    "request_id": "uuid",
    "tenant_id": "t_123",
    "extension_id": "com.alga.softwareone",
    "content_hash": "sha256:...",
    "version_id": "ver_abc"
  },
  "http": {
    "method": "POST",
    "path": "/agreements/sync",
    "query": { "force": "true" },
    "headers": { "content-type": "application/json" },
    "body_b64": "eyJwYXlsb2FkIjoiLi4uIn0="
  },
  "limits": { "timeout_ms": 5000, "memory_mb": 256 }
}

Response JSON (abridged):

{
  "status": 200,
  "headers": { "content-type": "application/json" },
  "body_b64": "eyJyZXN1bHQiOiJPSyJ9"
}

Header policy (allowlist / strip):

Forward: x-request-id, accept, content-type, accept-encoding, user-agent (normalized), x-alga-tenant (added by gateway), x-alga-extension (added), x-idempotency-key (generated for non-GET).
Strip: authorization from end-user; gateway authenticates user and injects a service credential to Runner.
Response: allow content-type, cache-control (if safe), custom x- headers under x-ext-*. Disallow set-cookie and hop-by-hop headers.

Security and limits:

RBAC: verify user can access the extension/endpoint before proxying.
Quotas: apply per-tenant rate limit and concurrency caps at the gateway; Runner enforces execution quotas.
Size: cap request/response body (e.g., 5–10 MB) with clear 413/502 handling.
Timeouts: default 5s; allow per-endpoint overrides with safe maximums (e.g., 30s).

Example Next.js handler (abridged):

// server/src/app/api/ext/[extensionId]/[...path]/route.ts
import { NextRequest, NextResponse } from 'next/server';

export async function handler(req: NextRequest, ctx: { params: { extensionId: string; path: string[] } }) {
  const requestId = req.headers.get('x-request-id') || crypto.randomUUID();
  const method = req.method;
  const { extensionId, path } = ctx.params;
  const pathname = '/' + (path || []).join('/');
  const url = new URL(req.url);

  const tenantId = await getTenantFromAuth(req);
  await assertAccess(tenantId, extensionId, method, pathname);

  const install = await getTenantInstall(tenantId, extensionId);
  if (!install) return NextResponse.json({ error: 'Not installed' }, { status: 404 });
  const { version_id, content_hash } = await resolveVersion(install);

  const endpoint = await resolveEndpoint(version_id, method, pathname);
  if (!endpoint) return NextResponse.json({ error: 'Not found' }, { status: 404 });

  const bodyBuf = method === 'GET' ? undefined : Buffer.from(await req.arrayBuffer());
  const execReq = {
    context: { request_id: requestId, tenant_id: tenantId, extension_id: extensionId, content_hash, version_id },
    http: {
      method,
      path: pathname,
      query: Object.fromEntries(url.searchParams.entries()),
      headers: filterHeaders(req.headers),
      body_b64: bodyBuf ? bodyBuf.toString('base64') : undefined
    },
    limits: { timeout_ms: Number(process.env.EXT_GATEWAY_TIMEOUT_MS) || 5000 }
  };

  const runnerResp = await fetch(`${process.env.RUNNER_BASE_URL}/v1/execute`, {
    method: 'POST',
    headers: {
      'content-type': 'application/json',
      'x-request-id': requestId,
      'authorization': await getRunnerServiceToken()
    },
    body: JSON.stringify(execReq),
    signal: AbortSignal.timeout(Number(process.env.EXT_GATEWAY_TIMEOUT_MS) || 5000)
  });

  if (!runnerResp.ok) {
    return NextResponse.json({ error: 'Runner error' }, { status: 502 });
  }
  const { status, headers, body_b64 } = await runnerResp.json();
  const resHeaders = filterResponseHeaders(headers);
  const body = body_b64 ? Buffer.from(body_b64, 'base64') : undefined;
  return new NextResponse(body, { status, headers: resHeaders });
}

export { handler as GET, handler as POST, handler as PUT, handler as PATCH, handler as DELETE };

Runtime Decision: Wasmtime (WASM-only)

Choice: Use Wasmtime as the sole runtime for executing extensions as WebAssembly modules.
Rationale (enterprise maturity):
- Backed by the Bytecode Alliance with a strong track record, multiple independent security audits, and responsive CVE handling.
- Production adoption across vendors; frequent releases; stable WASI Preview 1 support and growing Preview 2/component-model support.
- Rich security controls: memory limits, epoch-based interruption/timeouts, fuel metering, pooling allocator for predictable resource usage.
- Precompilation/caching: supports ahead-of-time compilation and serialized modules to reduce cold starts.
- Well-documented embedding API (Rust first-class, C API for other languages). We will implement the Runner as a Rust service embedding Wasmtime.

Implementation notes:

Language targets: prioritize AssemblyScript and Rust for authoring extensions that compile to WASI-compatible WASM; consider TinyGo where appropriate. Provide a TypeScript SDK for descriptor-driven UIs and for authoring AssemblyScript-based handlers.
Host API binding: expose capability-scoped functions as WASI-like imports via Wasmtime’s Linker (e.g., alga.storage.get/set, alga.http.fetch, alga.secrets.get, alga.log.info). No filesystem preopens; no ambient authority.
Resource controls: enforce per-invocation memory limits, timeouts via epoch interruption, and optional fuel metering for CPU budgeting. Configure pooling allocator to cap concurrent memory usage.
Provenance: require signed bundles; verify content hash and signature before loading modules. Cache precompiled modules by hash.
Isolation: one module instance per invocation (or per short-lived execution window). No shared mutable state beyond brokered APIs.
Multi-pod safety: Raw and precompiled artifacts stored in object storage keyed by content hash + target. Runners use only ephemeral local caches; no node-local persistent volumes required.

Execution Lifecycle

Authoring: Devs build against SDK + Host API types; alga-ext CLI validates locally.
Package: CLI produces a bundle (manifest, lockfile, compiled WASM) and signs it; optional AOT precompile for target architectures.
Publish: Push to registry; bundle stored in object storage by content hash.
Install: Tenant admin approves capabilities; per-tenant install record created with RLS.
Run: Event triggers runner → verify signature → load/precompiled module → instantiate with restricted Store/Linker → execute handler with brokered I/O only.
Observe: Logs, metrics, and traces recorded with per-tenant attribution; failures are quarantined.

Security Controls

Code provenance: signature verification, content-addressed storage, SBOM capture.
Sandboxing: Wasmtime isolates; no in-process eval/import of tenant JS; no preopened FS; no raw sockets; capability-scoped host imports only.
Resource limits: Wasmtime memory limits, epoch-based timeouts, optional fuel metering, and concurrency guards via worker pools.
Egress policy: deny by default; allowlist per tenant/extension with optional TLS pinning.
Secrets: mounted via broker with fine-grained tokens; never exposed wholesale.
Audit: structured logs, event->execution correlation IDs, immutable execution logs with retention.

Data Model (initial)

extension_registry(id, name, publisher, latest_version, deprecation, created_at)
extension_version(id, registry_id, semver, content_hash, signature, sbom_ref, created_at)
extension_bundle(id, content_hash, storage_url, size, runtime, sdk_version)
tenant_extension_install(id, tenant_id, registry_id, version_id, status, granted_caps, config, created_at)
extension_secret(id, tenant_install_id, key, created_at) (values in secret manager; reference only)
extension_event_subscription(id, tenant_install_id, event, filter, created_at)
extension_kv_store(tenant_id, extension_id, namespace, key, value, updated_at) with RLS
extension_execution_log(id, tenant_id, extension_id, event_id, started_at, finished_at, status, metrics, error)
extension_quota_usage(tenant_id, extension_id, window_start, cpu_ms, mem_mb_ms, invocations, egress_bytes)

Public APIs (EE)

Registry: list/get/publish/deprecate versions (publisher-scoped, admin-only operations).
Installation: install/uninstall/update; grant/revoke capabilities; manage secrets; validate config.
Execution Admin: test-run, health, metrics, and logs (scoped to tenant).
Event Subscriptions: list/update per tenant install.

Current Implementation

Initialization: No filesystem scanning. Extensions are managed via the v2 registry and per‑tenant installs.
Registry: Stores v2 manifest JSON and versioned bundle metadata. Tenant installs select a version and granted capabilities.
UI delivery: Iframe‑only via the Runner at ${RUNNER_PUBLIC_BASE}/ext-ui/{extensionId}/{content_hash}/[...], bootstrapped with the iframe bridge.
Gateway: All server calls go through /api/ext/[extensionId]/[...] (Gateway → Runner /v1/execute).
Storage/security: Tenant‑scoped storage services with capability‑scoped Host APIs. Bundles are signed and content‑addressed.

Bundle & Manifest v2 (draft)

Manifest keys: name, publisher, version, runtime (e.g., wasm-js@1), capabilities (explicit list), ui (iframe app definition), events (subscriptions), entry (runner entrypoint), assets (UI/static files), sbom.
Artifact: tarball with deterministic layout; top-level manifest.json, entry.wasm or isolated JS, descriptors/, and SIGNATURE.
Signing: compute SHA256 over canonical bundle; sign with developer certificate; store signature and public cert in registry.

Example (abridged):

{
  "name": "com.alga.softwareone",
  "publisher": "SoftwareOne",
  "version": "1.2.3",
  "runtime": "wasm-js@1",
  "capabilities": ["http.fetch", "storage.kv", "secrets.get"],
  "ui": {
    "type": "iframe",
    "entry": "ui/index.html",
    "routes": [
      { "path": "/agreements", "iframePath": "ui/agreements.html" },
      { "path": "/statements", "iframePath": "ui/statements.html" }
    ]
  },
  "events": [{ "topic": "billing.statement.created", "handler": "dist/handlers/statement.js" }],
  "entry": "dist/main.wasm",
  "precompiled": {
    "x86_64-linux-gnu": "artifacts/cwasm/x86_64-linux-gnu/main.cwasm",
    "aarch64-linux-gnu": "artifacts/cwasm/aarch64-linux-gnu/main.cwasm"
  },
  "api": {
    "endpoints": [
      { "method": "GET", "path": "/agreements", "handler": "dist/handlers/http/list_agreements" },
      { "method": "POST", "path": "/agreements/sync", "handler": "dist/handlers/http/sync" }
    ]
  },
  "assets": ["ui/**/*"],
  "sbom": "sbom.spdx.json"
}

Host API v1 (draft surface)

Core: context.extension(), context.tenant(), context.user()
Storage: storage.get/set/delete/list, namespaces; per-tenant/per-extension isolation
HTTP: http.fetch(url, opts) via egress broker with allowlists
Secrets: secrets.get(key) returning scoped secret handles
Events: events.emit(topic, payload), events.subscribe(topic) via manifest
Schedules: schedules.register(id, cron, handler) (phase 2/3)
Logging/Metrics: log.info/warn/error, metrics.counter/gauge/histogram

Milestones & Acceptance

M1: Registry + Bundle Store + Signing
- Publish/Install flows working; schema migrations in place; signatures verified on install
M2: Runner Service + Host API v1
- Execute a hello-world WASM extension via Wasmtime with quotas/timeouts and audit logs
M3: Client SDK (iframe)
- Render UI via iframe apps using the Alga Client SDK; CSP enforced; no raw dynamic import of tenant JS
M4: E2E for first partner
- One extension fully migrated; per-tenant install/config on prod-like env

Phase 1 – Foundations

Ship SDK v1, Host API v1 (capabilities: events, storage.kv, http.fetch via broker, secrets.get, log/metrics).
Implement Registry, Bundle Storage, and Build validation path; enable signed bundle install.

Phase 2 – Runner Service

Add WASM/isolate runner with quotas, timeouts, and signature verification.
Integrate Event Bus; implement execution logs and basic metrics.

Phase 3 – UI Extensions

Iframe-based UI host with CSP sandbox and postMessage bridge; asset signing pipeline.

Phase 4 – Migration & Deprecation

Provide migration guides; wrap legacy extensions via out-of-process adapters where feasible.
Hard deprecate in-process uploads/imports; remove code paths.

Backwards Compatibility

Legacy extensions can be proxied through the runner as external HTTP endpoints temporarily.
Provide an adapter library to help repackage common patterns into bundles.

Operational Considerations

Horizontal scale runner workers; shard by tenant to localize impact.
Warm cache frequently used bundles; prefetch on event bursts.
Circuit breakers and quarantine for crash loops or policy violations.

Success Metrics

0 in-process executions of tenant code in app.
P99 execution latency under target with sandboxing enabled.
No cross-tenant data access in penetration tests.
All bundles signed and verified; 100% execution logs correlated to events.

Open Questions

Which sandbox runtime to standardize on first: WASM (Wasmtime/WASI) vs V8 isolates? Preference: WASM for stronger capability discipline; allow a container tier for heavy/legacy cases.
Initial capability set scope: finalize MVP host APIs.
Pricing/billing alignment with quotas and egress costs.

Near-term Implementation Tasks (Progress Tracker)

The following concrete tasks align the current codebase with this plan and track progress.

Replace browser→S3 direct upload with server-proxied streaming
- Add server action extUploadProxy(FormData) to stream file to S3 staging (write-once)
- Convert Web ReadableStream → Node Readable before S3 PutObject
- Pass ContentLength to S3 to satisfy chunked signing
- Update InstallerPanel.tsx to use server action, then call extFinalizeUpload
- Remove presigned initiate flow and delete initiate-upload API route
Logging and diagnostics
- Structured logs + request IDs for upload path
- Admin-only DB registry introspection endpoint (/api/extensions/registry-db-check)
- Add request IDs and structured logs to finalize and abort paths
Registry v2 repository wiring
- Implement Knex-backed RegistryV2Repository (extensions + versions)
- Register via setRegistryV2Repository(...) at server startup (lazy init before finalize)
- Verify finalize writes registry/version/bundle rows end-to-end
Extensions UI uses Registry v2
- List tenant installs via v2 actions (joins on tenant_extension_install)
- Toggle/uninstall operate on tenant_extension_install
- After finalize, auto-create tenant install for current tenant
Align UI with “Install from Registry” flow [FUTURE -- DELAY]
- Restrict or hide direct upload UI for general users (admin/publisher only if retained)
- Replace “upload bundle” with “select version” from registry listing
- Update docs to emphasize CI publish + install-from-registry
Cleanup and tests
- Remove unused upload API route and legacy code paths once fully migrated
- Add targeted tests for upload server action and finalize happy-path

Retirement of Legacy Paths (Brand New System)

Legacy tables and services to avoid for EE extensions:
- extensions, extension_permissions, file-based component serving, and dynamic module import mechanisms.
- ExtensionRegistry (legacy) and actions that operate on the extensions table in management UI.
Canonical tables for EE extensions (Registry v2):
- extension_registry, extension_version, extension_bundle, tenant_extension_install.
UI and actions must exclusively use Registry v2:
- Listing, enable/disable, and uninstall operate on tenant_extension_install.
- Version metadata read from extension_version; registry identity from extension_registry.
- Bundle metadata resolved from object storage keyed by content hash.
Operational note: This system is brand new; no data migration is required. Do not write or read from legacy tables as part of EE extensions.

56 KiB Raw Blame History Unescape Escape

Client Extension Multi-Tenancy Overhaul Plan

Context & Findings

Goals

Overarching Phases

Non-Goals (for this overhaul)

Upfront Decisions (Simplifications)

Executive Summary

Server Actions-First Contract

Proposed Document Map

Phase 1 — Static Rendering via Rust Host (MinIO proxy)

Phase 1 — TODOs (Status)

Install Provisioning — State Diagram

Phase 2 — Dynamic WASM Features

Phase 2 — TODOs (Status)

Data Model and Registry (Shared Foundations)

Proposed Architecture

Components

Distributed Bundles, Assets, and Caching (multi-pod safe)

Runner Service Design (Rust + Wasmtime)

Client UI Delivery (iframe-only with SDK)

Client Asset Serving via Gateway (pod-local cache)

Knative Serving Profile (initial)

On-Demand Loading, Versioning, and Hot Swap

HTTP Routing for Plugin Endpoints

Next.js API Router/Proxy (design)

Runtime Decision: Wasmtime (WASM-only)

Execution Lifecycle

Security Controls

Data Model (initial)

Public APIs (EE)

Current Implementation

Bundle & Manifest v2 (draft)

Host API v1 (draft surface)

Milestones & Acceptance

Backwards Compatibility

Operational Considerations

Success Metrics

Open Questions

Near-term Implementation Tasks (Progress Tracker)

Retirement of Legacy Paths (Brand New System)

56 KiB

Raw Blame History