Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
7.9 KiB
7.9 KiB
Extension Runner Pluggable Deployment Plan (Docker & Knative Backends)
Overview
- Allow the extension gateway to target multiple runner backends (Knative in production, Docker in local/dev) without code changes in extension bundles.
- Keep the developer ergonomics of a single exposed port (
localhost:3000) by proxying Runner endpoints/UI through Next.js when running locally. - Preserve the existing Knative deployment model while introducing a first-class Docker Compose workflow for iterative testing.
Goals
- Introduce a
RunnerBackendabstraction that encapsulates execute/UI/health operations for the gateway. (Implemented inserver/src/lib/extensions/runner/backend.tswith Knative vs Docker backends.) - Provide configuration to select
knativeordockerbackends via environment variables with sane defaults. (EnvRUNNER_BACKENDwith defaults; seepackage.jsonscriptdev:runner.) - Add a gateway proxy route so extension UI assets can be served through the same origin as the main application. (ext-ui gate exists but still returns 404/redirect in rust mode; proxy parity for dev remains to be finished.)
- Package a Docker Compose setup and helper scripts that run the Runner container locally alongside the Next.js gateway. (See
scripts/dev-runner.shanddocker-compose.runner-dev.yml;npm run dev:runnerwires env.) - Document the new workflow and add smoke tests covering both backends. (Docs and automated smoke tests still TODO.)
Status update (2025-11-21): core backend selection and local Docker workflow are in place; ext-ui same-origin proxying and validation tests remain outstanding.
Non-Goals
- Replacing the existing Knative deployment or Temporal domain provisioning flows in production.
- Modifying Runner execution safety limits (memory, CPU, timeout) or capability provider contracts.
- Introducing a new public load balancer component solely for local development.
- Refactoring bundle storage/S3 access patterns.
Current State (Nov 2025)
- Gateway fetches
POST ${RUNNER_BASE_URL}/v1/executedirectly; static UI references${RUNNER_PUBLIC_BASE}/ext-ui/.... - RUNNER_BASE_URL is typically a Knative service URI inside the cluster; local testing requires hand-running the Rust binary and updating env vars manually.
- No formal abstraction exists for the runner; only one backend (Knative) is assumed throughout the TypeScript code.
- UI assets are not proxied—developers must align iframe origins manually when overriding
RUNNER_PUBLIC_BASE. - Docker assets exist for Runner, but there is no supported compose scenario that ties Runner + gateway together on a single port.
Requirements & Constraints
- Single origin: Locally, developers hit
http://localhost:3000for both app and extension UI; no additional LB container should be required. - Pluggable interface: Gateway must select backends through DI/config without branching logic sprinkled across routes.
- Configuration parity: Environment variable surface must clearly separate shared settings (timeouts, headers) from backend-specific values.
- Security parity: Docker backend should respect the same auth headers, service tokens, and logging redaction rules as Knative.
- Observability: Health checks and structured logging should include backend identity for troubleshooting.
Proposed Architecture
1. Runner Backend Abstraction
- Create
RunnerBackendinterface (TypeScript) with methods such asexecute(req),resolveUiUrl(extId, hash, path), andhealth() / metadata(). - Implement
KnativeRunnerBackend(current behaviour) andDockerRunnerBackend(connects to Docker container host/port). - Provide a factory that selects backend based on
RUNNER_BACKENDenv var (knativedefault).
2. Gateway Proxy Layer
- Replace direct
fetch(${RUNNER_BASE_URL}/v1/execute)with backend calls that return typed results and centralize error handling. - Add Next.js route (e.g.,
/runner/[...path]) that proxies static UI assets via the backend, so iframe URLs use the primary origin. - Update
buildExtUiSrc()to rely on backend helper for consistent URL construction.
3. Docker Backend Runtime Package
- Author
docker-compose.runner-dev.ymldefiningextension-runnerservice (build from existing Dockerfile, expose 8080 internally). - Create helper commands (
npm run dev:runner,./scripts/dev-runner.sh) to spin up Runner + Next dev with proper env defaults (RUNNER_BACKEND=docker,RUNNER_DOCKER_HOST=http://extension-runner:8080,RUNNER_PUBLIC_BASE=http://localhost:3000/runner). - Ensure Docker backend rewrites public UI URLs to
/runner/...while targeting the container internally.
4. Tooling & Testing
- Extend SDK/CLI dev commands to detect Docker backend and optionally build/push bundles into mounted volumes.
- Add smoke tests that run with
RUNNER_BACKEND=docker(mock Runner responses) to validate routing. - Update E2E suite to cover both backends where feasible or stub Docker backend via test doubles.
5. Documentation & Developer Workflow
- Document env matrix, start/stop commands, and troubleshooting tips in
docs/extension-system/development_guide.md. - Provide guidance for switching between backends without restarting (e.g., env var change + server reload).
- Highlight parity expectations (timeouts, auth tokens) and backend-specific caveats (e.g., no auto domain mapping in Docker mode).
Implementation Phases
Phase 0 — Design & Config Audit
- Finalize backend interface shape and config naming.
- Inventory env variables (
RUNNER_BASE_URL,RUNNER_PUBLIC_BASE, timeouts) and plan migration/aliases. - Decide on logging/telemetry structure for backend selection.
Phase 1 — Abstraction & Knative Parity
- Implement
RunnerBackendinterface + factory with Knative backend using existing logic. - Refactor gateway execute/UI code paths to use the abstraction without changing behaviour.
- Add feature flag / env validation ensuring fallback remains backwards compatible.
Phase 2 — Docker Backend & Proxy Routing
- Implement Docker backend (internal base URL, optional health check endpoint).
- Add
/runner/[...path]proxy route and update UI helpers to leverage backend URLs. - Ensure headers (auth, caching) and error propagation match production behaviour.
Phase 3 — Local Dev Tooling
- Ship Docker Compose file + scripts to run Runner + gateway with shared
.env. - Update CLI/SDK docs to reference new workflow; add convenience commands for bundling & install loops.
- Add smoke tests (unit/integration) covering Docker backend selection.
Phase 4 — Rollout & Docs
- Update developer docs, onboarding guides, and
.env.example. - Gather feedback from internal extension teams; iterate on ergonomics (auto restart, log streaming).
- Monitor for issues when switching between backends; add troubleshooting section.
Dependencies & Coordination
- DevOps: Compose file review, Runner image tags, local secrets management.
- Runner team: Validate Docker runtime behaviour (env parity, secrets mount paths).
- Gateway team: Assist with proxy route, auth enforcement, and caching headers.
- DX/Docs: Document workflow & update SDK tutorials.
Open Questions
- Should we support hot swapping backends without restarting Next.js? (Env reload vs. app restart.)
- How do we handle TLS/HTTPS locally if required for some browser APIs? (Proxy + mkcert?)
- Do we need watch mode for Runner container rebuilds, or are manual rebuilds sufficient?
- Should the Docker backend support optional port forwarding for direct UI asset access (bypassing proxy)?
Next Steps
- Draft
RunnerBackendinterface and share with gateway + runner stakeholders for feedback. - Prototype proxy route + Docker backend to validate single-origin behaviour.
- Prepare Compose stack and developer script for local testing.
- Schedule verification sessions (DX + extension teams) before rolling out docs.