# Extension Runner Pluggable Deployment Plan (Docker & Knative Backends) ## Overview - Allow the extension gateway to target multiple runner backends (Knative in production, Docker in local/dev) without code changes in extension bundles. - Keep the developer ergonomics of a single exposed port (`localhost:3000`) by proxying Runner endpoints/UI through Next.js when running locally. - Preserve the existing Knative deployment model while introducing a first-class Docker Compose workflow for iterative testing. ## Goals - [x] Introduce a `RunnerBackend` abstraction that encapsulates execute/UI/health operations for the gateway. *(Implemented in `server/src/lib/extensions/runner/backend.ts` with Knative vs Docker backends.)* - [x] Provide configuration to select `knative` or `docker` backends via environment variables with sane defaults. *(Env `RUNNER_BACKEND` with defaults; see `package.json` script `dev:runner`.)* - [ ] Add a gateway proxy route so extension UI assets can be served through the same origin as the main application. *(ext-ui gate exists but still returns 404/redirect in rust mode; proxy parity for dev remains to be finished.)* - [x] Package a Docker Compose setup and helper scripts that run the Runner container locally alongside the Next.js gateway. *(See `scripts/dev-runner.sh` and `docker-compose.runner-dev.yml`; `npm run dev:runner` wires env.)* - [ ] Document the new workflow and add smoke tests covering both backends. *(Docs and automated smoke tests still TODO.)* Status update (2025-11-21): core backend selection and local Docker workflow are in place; ext-ui same-origin proxying and validation tests remain outstanding. ## Non-Goals - Replacing the existing Knative deployment or Temporal domain provisioning flows in production. - Modifying Runner execution safety limits (memory, CPU, timeout) or capability provider contracts. - Introducing a new public load balancer component solely for local development. - Refactoring bundle storage/S3 access patterns. ## Current State (Nov 2025) - Gateway fetches `POST ${RUNNER_BASE_URL}/v1/execute` directly; static UI references `${RUNNER_PUBLIC_BASE}/ext-ui/...`. - RUNNER_BASE_URL is typically a Knative service URI inside the cluster; local testing requires hand-running the Rust binary and updating env vars manually. - No formal abstraction exists for the runner; only one backend (Knative) is assumed throughout the TypeScript code. - UI assets are not proxied—developers must align iframe origins manually when overriding `RUNNER_PUBLIC_BASE`. - Docker assets exist for Runner, but there is no supported compose scenario that ties Runner + gateway together on a single port. ## Requirements & Constraints - **Single origin**: Locally, developers hit `http://localhost:3000` for both app and extension UI; no additional LB container should be required. - **Pluggable interface**: Gateway must select backends through DI/config without branching logic sprinkled across routes. - **Configuration parity**: Environment variable surface must clearly separate shared settings (timeouts, headers) from backend-specific values. - **Security parity**: Docker backend should respect the same auth headers, service tokens, and logging redaction rules as Knative. - **Observability**: Health checks and structured logging should include backend identity for troubleshooting. ## Proposed Architecture ### 1. Runner Backend Abstraction - Create `RunnerBackend` interface (TypeScript) with methods such as `execute(req)`, `resolveUiUrl(extId, hash, path)`, and `health() / metadata()`. - Implement `KnativeRunnerBackend` (current behaviour) and `DockerRunnerBackend` (connects to Docker container host/port). - Provide a factory that selects backend based on `RUNNER_BACKEND` env var (`knative` default). ### 2. Gateway Proxy Layer - Replace direct `fetch(${RUNNER_BASE_URL}/v1/execute)` with backend calls that return typed results and centralize error handling. - Add Next.js route (e.g., `/runner/[...path]`) that proxies static UI assets via the backend, so iframe URLs use the primary origin. - Update `buildExtUiSrc()` to rely on backend helper for consistent URL construction. ### 3. Docker Backend Runtime Package - Author `docker-compose.runner-dev.yml` defining `extension-runner` service (build from existing Dockerfile, expose 8080 internally). - Create helper commands (`npm run dev:runner`, `./scripts/dev-runner.sh`) to spin up Runner + Next dev with proper env defaults (`RUNNER_BACKEND=docker`, `RUNNER_DOCKER_HOST=http://extension-runner:8080`, `RUNNER_PUBLIC_BASE=http://localhost:3000/runner`). - Ensure Docker backend rewrites public UI URLs to `/runner/...` while targeting the container internally. ### 4. Tooling & Testing - Extend SDK/CLI dev commands to detect Docker backend and optionally build/push bundles into mounted volumes. - Add smoke tests that run with `RUNNER_BACKEND=docker` (mock Runner responses) to validate routing. - Update E2E suite to cover both backends where feasible or stub Docker backend via test doubles. ### 5. Documentation & Developer Workflow - Document env matrix, start/stop commands, and troubleshooting tips in `docs/extension-system/development_guide.md`. - Provide guidance for switching between backends without restarting (e.g., env var change + server reload). - Highlight parity expectations (timeouts, auth tokens) and backend-specific caveats (e.g., no auto domain mapping in Docker mode). ## Implementation Phases ### Phase 0 — Design & Config Audit - [ ] Finalize backend interface shape and config naming. - [ ] Inventory env variables (`RUNNER_BASE_URL`, `RUNNER_PUBLIC_BASE`, timeouts) and plan migration/aliases. - [ ] Decide on logging/telemetry structure for backend selection. ### Phase 1 — Abstraction & Knative Parity - [x] Implement `RunnerBackend` interface + factory with Knative backend using existing logic. - [x] Refactor gateway execute/UI code paths to use the abstraction without changing behaviour. - [x] Add feature flag / env validation ensuring fallback remains backwards compatible. ### Phase 2 — Docker Backend & Proxy Routing - [x] Implement Docker backend (internal base URL, optional health check endpoint). - [x] Add `/runner/[...path]` proxy route and update UI helpers to leverage backend URLs. - [x] Ensure headers (auth, caching) and error propagation match production behaviour. ### Phase 3 — Local Dev Tooling - [x] Ship Docker Compose file + scripts to run Runner + gateway with shared `.env`. - [x] Update CLI/SDK docs to reference new workflow; add convenience commands for bundling & install loops. - [ ] Add smoke tests (unit/integration) covering Docker backend selection. ### Phase 4 — Rollout & Docs - [x] Update developer docs, onboarding guides, and `.env.example`. - [ ] Gather feedback from internal extension teams; iterate on ergonomics (auto restart, log streaming). - [ ] Monitor for issues when switching between backends; add troubleshooting section. ## Dependencies & Coordination - DevOps: Compose file review, Runner image tags, local secrets management. - Runner team: Validate Docker runtime behaviour (env parity, secrets mount paths). - Gateway team: Assist with proxy route, auth enforcement, and caching headers. - DX/Docs: Document workflow & update SDK tutorials. ## Open Questions - Should we support hot swapping backends without restarting Next.js? (Env reload vs. app restart.) - How do we handle TLS/HTTPS locally if required for some browser APIs? (Proxy + mkcert?) - Do we need watch mode for Runner container rebuilds, or are manual rebuilds sufficient? - Should the Docker backend support optional port forwarding for direct UI asset access (bypassing proxy)? ## Next Steps 1. Draft `RunnerBackend` interface and share with gateway + runner stakeholders for feedback. 2. Prototype proxy route + Docker backend to validate single-origin behaviour. 3. Prepare Compose stack and developer script for local testing. 4. Schedule verification sessions (DX + extension teams) before rolling out docs.