PSA/ee/docs/plans/2025-11-03-extension-runner-pluggable-plan.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

126 lines
7.9 KiB
Markdown

# Extension Runner Pluggable Deployment Plan (Docker & Knative Backends)
## Overview
- Allow the extension gateway to target multiple runner backends (Knative in production, Docker in local/dev) without code changes in extension bundles.
- Keep the developer ergonomics of a single exposed port (`localhost:3000`) by proxying Runner endpoints/UI through Next.js when running locally.
- Preserve the existing Knative deployment model while introducing a first-class Docker Compose workflow for iterative testing.
## Goals
- [x] Introduce a `RunnerBackend` abstraction that encapsulates execute/UI/health operations for the gateway. *(Implemented in `server/src/lib/extensions/runner/backend.ts` with Knative vs Docker backends.)*
- [x] Provide configuration to select `knative` or `docker` backends via environment variables with sane defaults. *(Env `RUNNER_BACKEND` with defaults; see `package.json` script `dev:runner`.)*
- [ ] Add a gateway proxy route so extension UI assets can be served through the same origin as the main application. *(ext-ui gate exists but still returns 404/redirect in rust mode; proxy parity for dev remains to be finished.)*
- [x] Package a Docker Compose setup and helper scripts that run the Runner container locally alongside the Next.js gateway. *(See `scripts/dev-runner.sh` and `docker-compose.runner-dev.yml`; `npm run dev:runner` wires env.)*
- [ ] Document the new workflow and add smoke tests covering both backends. *(Docs and automated smoke tests still TODO.)*
Status update (2025-11-21): core backend selection and local Docker workflow are in place; ext-ui same-origin proxying and validation tests remain outstanding.
## Non-Goals
- Replacing the existing Knative deployment or Temporal domain provisioning flows in production.
- Modifying Runner execution safety limits (memory, CPU, timeout) or capability provider contracts.
- Introducing a new public load balancer component solely for local development.
- Refactoring bundle storage/S3 access patterns.
## Current State (Nov 2025)
- Gateway fetches `POST ${RUNNER_BASE_URL}/v1/execute` directly; static UI references `${RUNNER_PUBLIC_BASE}/ext-ui/...`.
- RUNNER_BASE_URL is typically a Knative service URI inside the cluster; local testing requires hand-running the Rust binary and updating env vars manually.
- No formal abstraction exists for the runner; only one backend (Knative) is assumed throughout the TypeScript code.
- UI assets are not proxied—developers must align iframe origins manually when overriding `RUNNER_PUBLIC_BASE`.
- Docker assets exist for Runner, but there is no supported compose scenario that ties Runner + gateway together on a single port.
## Requirements & Constraints
- **Single origin**: Locally, developers hit `http://localhost:3000` for both app and extension UI; no additional LB container should be required.
- **Pluggable interface**: Gateway must select backends through DI/config without branching logic sprinkled across routes.
- **Configuration parity**: Environment variable surface must clearly separate shared settings (timeouts, headers) from backend-specific values.
- **Security parity**: Docker backend should respect the same auth headers, service tokens, and logging redaction rules as Knative.
- **Observability**: Health checks and structured logging should include backend identity for troubleshooting.
## Proposed Architecture
### 1. Runner Backend Abstraction
- Create `RunnerBackend` interface (TypeScript) with methods such as `execute(req)`, `resolveUiUrl(extId, hash, path)`, and `health() / metadata()`.
- Implement `KnativeRunnerBackend` (current behaviour) and `DockerRunnerBackend` (connects to Docker container host/port).
- Provide a factory that selects backend based on `RUNNER_BACKEND` env var (`knative` default).
### 2. Gateway Proxy Layer
- Replace direct `fetch(${RUNNER_BASE_URL}/v1/execute)` with backend calls that return typed results and centralize error handling.
- Add Next.js route (e.g., `/runner/[...path]`) that proxies static UI assets via the backend, so iframe URLs use the primary origin.
- Update `buildExtUiSrc()` to rely on backend helper for consistent URL construction.
### 3. Docker Backend Runtime Package
- Author `docker-compose.runner-dev.yml` defining `extension-runner` service (build from existing Dockerfile, expose 8080 internally).
- Create helper commands (`npm run dev:runner`, `./scripts/dev-runner.sh`) to spin up Runner + Next dev with proper env defaults (`RUNNER_BACKEND=docker`, `RUNNER_DOCKER_HOST=http://extension-runner:8080`, `RUNNER_PUBLIC_BASE=http://localhost:3000/runner`).
- Ensure Docker backend rewrites public UI URLs to `/runner/...` while targeting the container internally.
### 4. Tooling & Testing
- Extend SDK/CLI dev commands to detect Docker backend and optionally build/push bundles into mounted volumes.
- Add smoke tests that run with `RUNNER_BACKEND=docker` (mock Runner responses) to validate routing.
- Update E2E suite to cover both backends where feasible or stub Docker backend via test doubles.
### 5. Documentation & Developer Workflow
- Document env matrix, start/stop commands, and troubleshooting tips in `docs/extension-system/development_guide.md`.
- Provide guidance for switching between backends without restarting (e.g., env var change + server reload).
- Highlight parity expectations (timeouts, auth tokens) and backend-specific caveats (e.g., no auto domain mapping in Docker mode).
## Implementation Phases
### Phase 0 — Design & Config Audit
- [ ] Finalize backend interface shape and config naming.
- [ ] Inventory env variables (`RUNNER_BASE_URL`, `RUNNER_PUBLIC_BASE`, timeouts) and plan migration/aliases.
- [ ] Decide on logging/telemetry structure for backend selection.
### Phase 1 — Abstraction & Knative Parity
- [x] Implement `RunnerBackend` interface + factory with Knative backend using existing logic.
- [x] Refactor gateway execute/UI code paths to use the abstraction without changing behaviour.
- [x] Add feature flag / env validation ensuring fallback remains backwards compatible.
### Phase 2 — Docker Backend & Proxy Routing
- [x] Implement Docker backend (internal base URL, optional health check endpoint).
- [x] Add `/runner/[...path]` proxy route and update UI helpers to leverage backend URLs.
- [x] Ensure headers (auth, caching) and error propagation match production behaviour.
### Phase 3 — Local Dev Tooling
- [x] Ship Docker Compose file + scripts to run Runner + gateway with shared `.env`.
- [x] Update CLI/SDK docs to reference new workflow; add convenience commands for bundling & install loops.
- [ ] Add smoke tests (unit/integration) covering Docker backend selection.
### Phase 4 — Rollout & Docs
- [x] Update developer docs, onboarding guides, and `.env.example`.
- [ ] Gather feedback from internal extension teams; iterate on ergonomics (auto restart, log streaming).
- [ ] Monitor for issues when switching between backends; add troubleshooting section.
## Dependencies & Coordination
- DevOps: Compose file review, Runner image tags, local secrets management.
- Runner team: Validate Docker runtime behaviour (env parity, secrets mount paths).
- Gateway team: Assist with proxy route, auth enforcement, and caching headers.
- DX/Docs: Document workflow & update SDK tutorials.
## Open Questions
- Should we support hot swapping backends without restarting Next.js? (Env reload vs. app restart.)
- How do we handle TLS/HTTPS locally if required for some browser APIs? (Proxy + mkcert?)
- Do we need watch mode for Runner container rebuilds, or are manual rebuilds sufficient?
- Should the Docker backend support optional port forwarding for direct UI asset access (bypassing proxy)?
## Next Steps
1. Draft `RunnerBackend` interface and share with gateway + runner stakeholders for feedback.
2. Prototype proxy route + Docker backend to validate single-origin behaviour.
3. Prepare Compose stack and developer script for local testing.
4. Schedule verification sessions (DX + extension teams) before rolling out docs.