Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
13 KiB
PRD — Workflow V2 Runtime Worker Ownership Split
- Slug:
workflow-v2-runtime-worker-ownership-split - Date:
2026-04-09 - Status: Draft
- Parent context:
ee/docs/plans/2026-04-08-workflow-v2-temporal-native-runtime/ee/docs/plans/2026-04-08-workflow-v2-temporal-hard-cutover-remediation/
Summary
Move authored Workflow Runtime V2 Temporal execution out of temporal-worker and into workflow-worker, then repair the package/runtime layering so the authored runtime has a clean worker-safe core surface and a separate bootstrap surface.
The intended end state is:
workflow-workerowns authored Workflow Runtime V2 Temporal execution on queueworkflow-runtime-v2workflow-workercontinues to own workflow event-stream ingress and other authored-workflow support dutiestemporal-workercontinues to own non-authored/domain Temporal workflows only@alga-psa/workflowsstops exposing a mixed runtime/bootstrap boundary that drags app-only or source-layout-specific dependencies into standalone worker startup
Problem
The current authored runtime placement and package shape create both operational confusion and concrete boot failures.
1. Wrong operational ownership for authored workflows
A manually started authored workflow launches successfully into Temporal, but Temporal UI shows no worker polling the workflow-runtime-v2 queue. Today that queue is effectively tied to temporal-worker, even though authored workflow ingress, projections, and support duties already live with workflow-worker.
This splits authored workflow responsibility across two workers and makes debugging harder:
- authored workflow event ingress lives in
workflow-worker - authored runtime execution lives in
temporal-worker - domain/job Temporal workflows also live in
temporal-worker
The result is unclear ownership and slower diagnosis when authored runs stall.
2. The current workflow runtime package boundary is not real enough for a standalone worker
The current @alga-psa/workflows/runtime surface mixes:
- runtime core exports
- AI/bootstrap wiring
- email/action registration side effects
- repo-relative imports into
shared/...andpackages/ee/src/... - path alias assumptions such as
@shared/*
That package shape works in app/source mode, but it is not a stable worker-safe boundary. Built artifacts are still coupled to repo layout and transitive app concerns.
3. Temporal worker startup failures are exposing the architectural issue
The immediate failures seen in local runtime validation are symptoms of the deeper boundary problem:
- dist artifacts reaching back into repo source structure
- worker startup pulling in modules unrelated to authored runtime execution
- path alias and dist/export mismatches surfacing only in standalone worker mode
If left unfixed, this will continue to make authored runtime execution brittle across local, CI, and deployment environments.
Goals
- Make
workflow-workerthe permanent execution owner of authored Workflow Runtime V2 Temporal runs. - Keep
temporal-workerresponsible only for non-authored/domain Temporal workflows. - Preserve a single-process
workflow-workermodel that handles both authored workflow support duties and authored Temporal queue polling. - Split
@alga-psa/workflowsinto a clean worker-safe runtime core surface and a separate bootstrap/app-wiring surface. - Eliminate source-layout-relative and unresolved alias dependencies from authored runtime startup paths.
- Make local and deployed authored workflow execution operationally intuitive: authored runtime issues should be diagnosable from
workflow-workerfirst.
Non-goals
- Full package extraction of authored runtime into an entirely new npm workspace/package.
- Retirement of
temporal-workeras a whole. - Migration of non-authored/domain Temporal workflows into
workflow-worker. - Broad redesign of workflow authoring UX.
- Reintroducing the legacy DB runtime as an execution fallback.
Users and Primary Flows
Primary users
- Platform engineers responsible for authored workflow runtime correctness
- Operators debugging stalled or failed authored workflow runs
- Developers running EE workflow runtime locally
Primary flows
Flow 1: Manual/API authored run starts
- A user starts an authored Workflow Runtime V2 run from UI or API.
- The server launches the Temporal workflow on queue
workflow-runtime-v2. workflow-workerpicks up the Temporal task and executes the authored workflow.- The run progresses without depending on
temporal-worker.
Flow 2: Event ingress resumes authored waits
workflow-workerconsumes workflow event ingress.- It resolves candidate authored Temporal waits/runs.
- It signals Temporal-backed authored runs.
- The same
workflow-workerprocess family owns both ingress and authored runtime execution, making failures easier to trace.
Flow 3: Domain/job workflows continue unchanged
- Existing non-authored Temporal workflows continue polling on
temporal-worker. - Removing
workflow-runtime-v2fromtemporal-workerdoes not affect tenant/domain/job/schedule workflows that are not part of authored Workflow Runtime V2.
UX / UI Notes
- No workflow authoring UX changes are required.
- Temporal UI should show an active worker for the
workflow-runtime-v2queue onceworkflow-workeris running. - Run support/debug flow should become simpler:
- authored workflow execution problem → inspect
workflow-worker - non-authored/domain Temporal problem → inspect
temporal-worker
- authored workflow execution problem → inspect
- Existing run/event/projected UI surfaces should continue to behave the same as long as projections remain correct.
Requirements
Functional Requirements
FR-1: Authored runtime ownership
workflow-workermust poll the Temporal queue used for authored Workflow Runtime V2 execution.- The authored runtime queue remains
workflow-runtime-v2unless an explicit rename is separately approved. - Authored Workflow Runtime V2 Temporal workflows must execute successfully with only
workflow-workerrunning, assuming Temporal server is available.
FR-2: Temporal worker scope reduction
temporal-workermust stop pollingworkflow-runtime-v2.temporal-workermust continue polling all approved non-authored/domain queues.- Removing authored queue ownership from
temporal-workermust not break unrelated Temporal workflows.
FR-3: Single-process workflow-worker ownership
- The same
workflow-workerprocess/container must own:- workflow event-stream ingress/support paths
- authored Workflow Runtime V2 Temporal queue polling
- This must not require a second sidecar process inside the
workflow-workercontainer.
FR-4: Runtime core vs bootstrap split
@alga-psa/workflowsmust expose a worker-safe authored runtime core surface with no app/bootstrap side effects.- AI inference service wiring, AI action registration, and similar app/bootstrap concerns must live in a separate bootstrap-oriented surface.
workflow-workermust import only the worker-safe runtime/core surface plus explicit worker-safe registrations it truly needs.
FR-5: Worker-safe import graph
- Authored runtime startup paths used by
workflow-workermust not depend on:- unresolved
@shared/*aliases - raw repo-relative imports into unrelated source trees
- UI-only or app-only modules
- unresolved
- Built artifacts used in worker contexts must be self-contained enough to resolve through stable package exports or explicit local worker-owned files.
FR-6: Clear runtime initialization contract
- Runtime initialization required by
workflow-workermust be explicit and deterministic. - Worker-safe initialization must register only what authored runtime execution actually needs.
- Server/bootstrap initialization may register additional app-facing behavior, but that must not leak into worker-safe core by accident.
FR-7: Compose and environment alignment
- Local Docker/compose wiring must reflect the new ownership model.
workflow-workermust receive the Temporal environment and queue configuration required to poll authored workflow tasks.temporal-workerconfiguration must no longer imply ownership of authored workflow runtime.
FR-8: Backward-compatible run launch contract
- Existing authored run launch and signal helpers may continue targeting Temporal queue
workflow-runtime-v2. - The move in ownership from
temporal-workertoworkflow-workermust not require API/UI changes to start or signal authored runs.
FR-9: Operability and support clarity
- Logs should make it obvious that
workflow-workerhas started Temporal polling for authored runtime. - Logs/config should make it obvious that
temporal-workeris not expected to ownworkflow-runtime-v2anymore.
Non-functional Requirements
- Prefer explicit boundaries over hidden fallback behavior.
- Fail fast when worker-safe runtime code accidentally depends on bootstrap/app-only layers.
- Keep authored runtime deterministic and replay-safe.
- Minimize the blast radius to non-authored/domain Temporal workflows.
- Preserve local-development practicality; the authored runtime should be testable without package-layout hacks.
Data / API / Integrations
Relevant files
services/workflow-worker/src/index.tsservices/workflow-worker/Dockerfileservices/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.tsee/temporal-workflows/src/worker.tsee/temporal-workflows/src/workflows/workflow-runtime-v2-run-workflow.tsee/temporal-workflows/src/activities/workflow-runtime-v2-activities.tsee/packages/workflows/src/runtime/index.tsee/packages/workflows/package.jsondocker-compose.ee.yamldocker-compose.temporal.ee.yaml
Integration notes
- Temporal workflow definitions and activities for authored Workflow Runtime V2 may still physically live under
ee/temporal-workflowsinitially, but operational ownership moves toworkflow-worker. - If shared code from
ee/temporal-workflowsis reused byworkflow-worker, the import/build contract must be explicit and worker-safe. - Build/export fixes should prefer stable package/module boundaries over additional container-only symlink hacks.
Risks
-
Partial move leaves both workers polling authored queue
- Mitigation: explicit queue ownership changes plus startup tests/log assertions.
-
Move breaks non-authored Temporal workflows accidentally
- Mitigation: keep
temporal-workerscope narrow and validate non-authored queue config separately.
- Mitigation: keep
-
Runtime split is incomplete and worker-safe boundary still leaks bootstrap concerns
- Mitigation: define and test the intended core/bootstrap surfaces explicitly.
-
Docker/compose changes hide code-level layering issues temporarily
- Mitigation: require worker-safe import/build validation outside compose where practical.
-
Legacy DB polling path becomes confused with authored Temporal polling
- Mitigation: keep authored Temporal ownership clearly separated from optional legacy DB polling flags.
Rollout / Migration Notes
- This is an architecture change, not a local-only workaround.
- Existing authored Workflow Runtime V2 runs continue to use Temporal as execution authority.
- The operational owner of authored queue polling changes from
temporal-workertoworkflow-worker. - Local environment and deployment manifests must be updated consistently so only
workflow-workerowns authored queue polling. - No authored-run API contract change is required for callers.
Open Questions
- Should authored Temporal workflow definitions/activities remain physically housed under
ee/temporal-workflowsfor now, or should a later follow-up relocate them closer toworkflow-worker? - Which exact bootstrap registrations should remain app/server-only versus worker-safe shared registrations?
- Do we want an explicit startup assertion that fails if both workers are configured to poll
workflow-runtime-v2?
Acceptance Criteria / Definition of Done
This plan is done when:
- Starting an authored Workflow Runtime V2 run results in
workflow-workerpolling and executing queueworkflow-runtime-v2. temporal-workerno longer pollsworkflow-runtime-v2but continues running non-authored/domain Temporal workflows.workflow-workercan start authored runtime execution in the same process/container that already handles event-stream ingress.@alga-psa/workflowsexposes a worker-safe runtime/core surface that does not depend on bootstrap/app-only wiring.- Worker startup paths no longer depend on unresolved
@shared/*aliases or repo-layout-relative source hops. - Local EE smoke testing shows Temporal UI reporting active workers for authored queue ownership through
workflow-worker. - Focused tests cover queue ownership, worker startup, authored run execution, and runtime boundary separation.