Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
229 lines
13 KiB
Markdown
229 lines
13 KiB
Markdown
# PRD — Workflow V2 Runtime Worker Ownership Split
|
|
|
|
- Slug: `workflow-v2-runtime-worker-ownership-split`
|
|
- Date: `2026-04-09`
|
|
- Status: Draft
|
|
- Parent context:
|
|
- `ee/docs/plans/2026-04-08-workflow-v2-temporal-native-runtime/`
|
|
- `ee/docs/plans/2026-04-08-workflow-v2-temporal-hard-cutover-remediation/`
|
|
|
|
## Summary
|
|
|
|
Move authored Workflow Runtime V2 Temporal execution out of `temporal-worker` and into `workflow-worker`, then repair the package/runtime layering so the authored runtime has a clean worker-safe core surface and a separate bootstrap surface.
|
|
|
|
The intended end state is:
|
|
|
|
- `workflow-worker` owns authored Workflow Runtime V2 Temporal execution on queue `workflow-runtime-v2`
|
|
- `workflow-worker` continues to own workflow event-stream ingress and other authored-workflow support duties
|
|
- `temporal-worker` continues to own non-authored/domain Temporal workflows only
|
|
- `@alga-psa/workflows` stops exposing a mixed runtime/bootstrap boundary that drags app-only or source-layout-specific dependencies into standalone worker startup
|
|
|
|
## Problem
|
|
|
|
The current authored runtime placement and package shape create both operational confusion and concrete boot failures.
|
|
|
|
### 1. Wrong operational ownership for authored workflows
|
|
A manually started authored workflow launches successfully into Temporal, but Temporal UI shows no worker polling the `workflow-runtime-v2` queue. Today that queue is effectively tied to `temporal-worker`, even though authored workflow ingress, projections, and support duties already live with `workflow-worker`.
|
|
|
|
This splits authored workflow responsibility across two workers and makes debugging harder:
|
|
|
|
- authored workflow event ingress lives in `workflow-worker`
|
|
- authored runtime execution lives in `temporal-worker`
|
|
- domain/job Temporal workflows also live in `temporal-worker`
|
|
|
|
The result is unclear ownership and slower diagnosis when authored runs stall.
|
|
|
|
### 2. The current workflow runtime package boundary is not real enough for a standalone worker
|
|
The current `@alga-psa/workflows/runtime` surface mixes:
|
|
|
|
- runtime core exports
|
|
- AI/bootstrap wiring
|
|
- email/action registration side effects
|
|
- repo-relative imports into `shared/...` and `packages/ee/src/...`
|
|
- path alias assumptions such as `@shared/*`
|
|
|
|
That package shape works in app/source mode, but it is not a stable worker-safe boundary. Built artifacts are still coupled to repo layout and transitive app concerns.
|
|
|
|
### 3. Temporal worker startup failures are exposing the architectural issue
|
|
The immediate failures seen in local runtime validation are symptoms of the deeper boundary problem:
|
|
|
|
- dist artifacts reaching back into repo source structure
|
|
- worker startup pulling in modules unrelated to authored runtime execution
|
|
- path alias and dist/export mismatches surfacing only in standalone worker mode
|
|
|
|
If left unfixed, this will continue to make authored runtime execution brittle across local, CI, and deployment environments.
|
|
|
|
## Goals
|
|
|
|
1. Make `workflow-worker` the permanent execution owner of authored Workflow Runtime V2 Temporal runs.
|
|
2. Keep `temporal-worker` responsible only for non-authored/domain Temporal workflows.
|
|
3. Preserve a single-process `workflow-worker` model that handles both authored workflow support duties and authored Temporal queue polling.
|
|
4. Split `@alga-psa/workflows` into a clean worker-safe runtime core surface and a separate bootstrap/app-wiring surface.
|
|
5. Eliminate source-layout-relative and unresolved alias dependencies from authored runtime startup paths.
|
|
6. Make local and deployed authored workflow execution operationally intuitive: authored runtime issues should be diagnosable from `workflow-worker` first.
|
|
|
|
## Non-goals
|
|
|
|
- Full package extraction of authored runtime into an entirely new npm workspace/package.
|
|
- Retirement of `temporal-worker` as a whole.
|
|
- Migration of non-authored/domain Temporal workflows into `workflow-worker`.
|
|
- Broad redesign of workflow authoring UX.
|
|
- Reintroducing the legacy DB runtime as an execution fallback.
|
|
|
|
## Users and Primary Flows
|
|
|
|
### Primary users
|
|
|
|
- Platform engineers responsible for authored workflow runtime correctness
|
|
- Operators debugging stalled or failed authored workflow runs
|
|
- Developers running EE workflow runtime locally
|
|
|
|
### Primary flows
|
|
|
|
#### Flow 1: Manual/API authored run starts
|
|
1. A user starts an authored Workflow Runtime V2 run from UI or API.
|
|
2. The server launches the Temporal workflow on queue `workflow-runtime-v2`.
|
|
3. `workflow-worker` picks up the Temporal task and executes the authored workflow.
|
|
4. The run progresses without depending on `temporal-worker`.
|
|
|
|
#### Flow 2: Event ingress resumes authored waits
|
|
1. `workflow-worker` consumes workflow event ingress.
|
|
2. It resolves candidate authored Temporal waits/runs.
|
|
3. It signals Temporal-backed authored runs.
|
|
4. The same `workflow-worker` process family owns both ingress and authored runtime execution, making failures easier to trace.
|
|
|
|
#### Flow 3: Domain/job workflows continue unchanged
|
|
1. Existing non-authored Temporal workflows continue polling on `temporal-worker`.
|
|
2. Removing `workflow-runtime-v2` from `temporal-worker` does not affect tenant/domain/job/schedule workflows that are not part of authored Workflow Runtime V2.
|
|
|
|
## UX / UI Notes
|
|
|
|
- No workflow authoring UX changes are required.
|
|
- Temporal UI should show an active worker for the `workflow-runtime-v2` queue once `workflow-worker` is running.
|
|
- Run support/debug flow should become simpler:
|
|
- authored workflow execution problem → inspect `workflow-worker`
|
|
- non-authored/domain Temporal problem → inspect `temporal-worker`
|
|
- Existing run/event/projected UI surfaces should continue to behave the same as long as projections remain correct.
|
|
|
|
## Requirements
|
|
|
|
### Functional Requirements
|
|
|
|
#### FR-1: Authored runtime ownership
|
|
- `workflow-worker` must poll the Temporal queue used for authored Workflow Runtime V2 execution.
|
|
- The authored runtime queue remains `workflow-runtime-v2` unless an explicit rename is separately approved.
|
|
- Authored Workflow Runtime V2 Temporal workflows must execute successfully with only `workflow-worker` running, assuming Temporal server is available.
|
|
|
|
#### FR-2: Temporal worker scope reduction
|
|
- `temporal-worker` must stop polling `workflow-runtime-v2`.
|
|
- `temporal-worker` must continue polling all approved non-authored/domain queues.
|
|
- Removing authored queue ownership from `temporal-worker` must not break unrelated Temporal workflows.
|
|
|
|
#### FR-3: Single-process workflow-worker ownership
|
|
- The same `workflow-worker` process/container must own:
|
|
- workflow event-stream ingress/support paths
|
|
- authored Workflow Runtime V2 Temporal queue polling
|
|
- This must not require a second sidecar process inside the `workflow-worker` container.
|
|
|
|
#### FR-4: Runtime core vs bootstrap split
|
|
- `@alga-psa/workflows` must expose a worker-safe authored runtime core surface with no app/bootstrap side effects.
|
|
- AI inference service wiring, AI action registration, and similar app/bootstrap concerns must live in a separate bootstrap-oriented surface.
|
|
- `workflow-worker` must import only the worker-safe runtime/core surface plus explicit worker-safe registrations it truly needs.
|
|
|
|
#### FR-5: Worker-safe import graph
|
|
- Authored runtime startup paths used by `workflow-worker` must not depend on:
|
|
- unresolved `@shared/*` aliases
|
|
- raw repo-relative imports into unrelated source trees
|
|
- UI-only or app-only modules
|
|
- Built artifacts used in worker contexts must be self-contained enough to resolve through stable package exports or explicit local worker-owned files.
|
|
|
|
#### FR-6: Clear runtime initialization contract
|
|
- Runtime initialization required by `workflow-worker` must be explicit and deterministic.
|
|
- Worker-safe initialization must register only what authored runtime execution actually needs.
|
|
- Server/bootstrap initialization may register additional app-facing behavior, but that must not leak into worker-safe core by accident.
|
|
|
|
#### FR-7: Compose and environment alignment
|
|
- Local Docker/compose wiring must reflect the new ownership model.
|
|
- `workflow-worker` must receive the Temporal environment and queue configuration required to poll authored workflow tasks.
|
|
- `temporal-worker` configuration must no longer imply ownership of authored workflow runtime.
|
|
|
|
#### FR-8: Backward-compatible run launch contract
|
|
- Existing authored run launch and signal helpers may continue targeting Temporal queue `workflow-runtime-v2`.
|
|
- The move in ownership from `temporal-worker` to `workflow-worker` must not require API/UI changes to start or signal authored runs.
|
|
|
|
#### FR-9: Operability and support clarity
|
|
- Logs should make it obvious that `workflow-worker` has started Temporal polling for authored runtime.
|
|
- Logs/config should make it obvious that `temporal-worker` is not expected to own `workflow-runtime-v2` anymore.
|
|
|
|
### Non-functional Requirements
|
|
|
|
- Prefer explicit boundaries over hidden fallback behavior.
|
|
- Fail fast when worker-safe runtime code accidentally depends on bootstrap/app-only layers.
|
|
- Keep authored runtime deterministic and replay-safe.
|
|
- Minimize the blast radius to non-authored/domain Temporal workflows.
|
|
- Preserve local-development practicality; the authored runtime should be testable without package-layout hacks.
|
|
|
|
## Data / API / Integrations
|
|
|
|
### Relevant files
|
|
|
|
- `services/workflow-worker/src/index.ts`
|
|
- `services/workflow-worker/Dockerfile`
|
|
- `services/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.ts`
|
|
- `ee/temporal-workflows/src/worker.ts`
|
|
- `ee/temporal-workflows/src/workflows/workflow-runtime-v2-run-workflow.ts`
|
|
- `ee/temporal-workflows/src/activities/workflow-runtime-v2-activities.ts`
|
|
- `ee/packages/workflows/src/runtime/index.ts`
|
|
- `ee/packages/workflows/package.json`
|
|
- `docker-compose.ee.yaml`
|
|
- `docker-compose.temporal.ee.yaml`
|
|
|
|
### Integration notes
|
|
|
|
- Temporal workflow definitions and activities for authored Workflow Runtime V2 may still physically live under `ee/temporal-workflows` initially, but operational ownership moves to `workflow-worker`.
|
|
- If shared code from `ee/temporal-workflows` is reused by `workflow-worker`, the import/build contract must be explicit and worker-safe.
|
|
- Build/export fixes should prefer stable package/module boundaries over additional container-only symlink hacks.
|
|
|
|
## Risks
|
|
|
|
1. **Partial move leaves both workers polling authored queue**
|
|
- Mitigation: explicit queue ownership changes plus startup tests/log assertions.
|
|
|
|
2. **Move breaks non-authored Temporal workflows accidentally**
|
|
- Mitigation: keep `temporal-worker` scope narrow and validate non-authored queue config separately.
|
|
|
|
3. **Runtime split is incomplete and worker-safe boundary still leaks bootstrap concerns**
|
|
- Mitigation: define and test the intended core/bootstrap surfaces explicitly.
|
|
|
|
4. **Docker/compose changes hide code-level layering issues temporarily**
|
|
- Mitigation: require worker-safe import/build validation outside compose where practical.
|
|
|
|
5. **Legacy DB polling path becomes confused with authored Temporal polling**
|
|
- Mitigation: keep authored Temporal ownership clearly separated from optional legacy DB polling flags.
|
|
|
|
## Rollout / Migration Notes
|
|
|
|
- This is an architecture change, not a local-only workaround.
|
|
- Existing authored Workflow Runtime V2 runs continue to use Temporal as execution authority.
|
|
- The operational owner of authored queue polling changes from `temporal-worker` to `workflow-worker`.
|
|
- Local environment and deployment manifests must be updated consistently so only `workflow-worker` owns authored queue polling.
|
|
- No authored-run API contract change is required for callers.
|
|
|
|
## Open Questions
|
|
|
|
1. Should authored Temporal workflow definitions/activities remain physically housed under `ee/temporal-workflows` for now, or should a later follow-up relocate them closer to `workflow-worker`?
|
|
2. Which exact bootstrap registrations should remain app/server-only versus worker-safe shared registrations?
|
|
3. Do we want an explicit startup assertion that fails if both workers are configured to poll `workflow-runtime-v2`?
|
|
|
|
## Acceptance Criteria / Definition of Done
|
|
|
|
This plan is done when:
|
|
|
|
1. Starting an authored Workflow Runtime V2 run results in `workflow-worker` polling and executing queue `workflow-runtime-v2`.
|
|
2. `temporal-worker` no longer polls `workflow-runtime-v2` but continues running non-authored/domain Temporal workflows.
|
|
3. `workflow-worker` can start authored runtime execution in the same process/container that already handles event-stream ingress.
|
|
4. `@alga-psa/workflows` exposes a worker-safe runtime/core surface that does not depend on bootstrap/app-only wiring.
|
|
5. Worker startup paths no longer depend on unresolved `@shared/*` aliases or repo-layout-relative source hops.
|
|
6. Local EE smoke testing shows Temporal UI reporting active workers for authored queue ownership through `workflow-worker`.
|
|
7. Focused tests cover queue ownership, worker startup, authored run execution, and runtime boundary separation.
|