Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
18 KiB
18 KiB
SCRATCHPAD — Workflow V2 Temporal Hard-Cutover Remediation
Purpose
Focused follow-up plan for the remaining blockers after the broader Temporal-native runtime plan.
Parent plan:
ee/docs/plans/2026-04-08-workflow-v2-temporal-native-runtime/
User-approved scope decisions
- Scope choice: B
- include the 3 remaining review items plus adjacent operator/runtime cleanup needed to make hard cutover shippable
- Legacy operator behavior for Temporal runs: A — hard fail
- do not translate old DB-runtime-style actions for Temporal-backed runs
- Event correlation source: C — support both
- explicit correlation if present
- configured derivation if explicit correlation is absent
- Expression parity posture: A — exact parity
- Temporal must use the same expression contract as the canonical workflow runtime
- Cutover posture: hard cutover
- shared / legacy DB-runtime authority can be removed for Workflow Runtime V2
Review findings this plan addresses
-
Expression engine parity gap
ee/temporal-workflows/src/workflows/workflow-runtime-v2-run-workflow.ts- custom JSONata evaluator diverges from canonical expression contract in
shared/workflow/runtime/expressionEngine.ts - known gap: missing
nowIso()support - likely additional gap: skipped guardrails/allowed-function enforcement
-
Remaining DB/legacy authority over Temporal runs
ee/packages/workflows/src/actions/workflow-runtime-v2-actions.tsresumeWorkflowRunAction()still mutates waits/runs and callsWorkflowRuntimeV2.executeRun(...)retryWorkflowRunAction()still callsWorkflowRuntimeV2.executeRun(...)submitWorkflowEventAction()still resolves waits through DB authority patternscancelWorkflowRunAction()still projectsCANCELEDeven if Temporal cancel fails
-
Event ingress correlation bug
services/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.ts- currently uses
event.event_idas correlation key for audit, candidate lookup, and Temporal signal payload - real
event.waitsteps typically correlate on business keys like ticket/project/account identifiers
Constraints / implementation guardrails
- Prefer explicit failure over silent fallback.
- Do not preserve hybrid authority between Temporal and DB runtime for V2.
- DB tables remain projection/index/audit surfaces only.
- Candidate lookup may use DB indexes, but Temporal remains the only resume authority.
- Unsupported Temporal actions should fail loudly and clearly for operators/support.
Likely affected files
ee/temporal-workflows/src/workflows/workflow-runtime-v2-run-workflow.tsshared/workflow/runtime/expressionEngine.tsee/packages/workflows/src/actions/workflow-runtime-v2-actions.tsee/packages/workflows/src/lib/workflowRuntimeV2Temporal.tsservices/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.tspackages/event-bus/src/schemas/workflowEventSchema.tsshared/workflow/persistence/workflowRunWaitModelV2.ts
Expected test areas
- Temporal workflow/runtime unit tests
- server/API unit tests for run-control actions
- workflow worker event-ingress tests
- possibly one DB-backed integration path for correlation routing/projection sanity
Notes
- Existing broader runtime plan already covers the end-state architecture.
- This focused plan is meant to track the remaining hard-cutover blockers as a shippable remediation slice.
Progress — 2026-04-09 (Run-control hard cutover)
Completed scope in this pass
- Implemented hard-fail guardrails for legacy run-control actions on Temporal runs in:
ee/packages/workflows/src/actions/workflow-runtime-v2-actions.ts
- Added explicit unsupported-action boundary for Temporal runs:
resumeWorkflowRunActionnow fails with409+ codeWORKFLOW_TEMPORAL_ACTION_UNSUPPORTEDretryWorkflowRunActionnow fails with409+ codeWORKFLOW_TEMPORAL_ACTION_UNSUPPORTEDrequeueWorkflowRunEventWaitActionnow fails with409+ codeWORKFLOW_TEMPORAL_ACTION_UNSUPPORTED
- Fixed Temporal cancel authority semantics:
cancelWorkflowRunActionnow calls Temporal cancel first and does not project DBCANCELEDstate if Temporal cancel fails.- Failure path now returns explicit
409+ codeWORKFLOW_TEMPORAL_CANCEL_FAILEDwith actionable hint.
Key decisions and rationale
- Decision: treat legacy admin controls (
resume,retry,requeue_event_wait) as unsupported forengine='temporal'.- Rationale: enforces single execution authority (Temporal) and prevents split-brain DB projection mutations.
- Decision: cancel remains supported for Temporal runs, but only when the Temporal cancel request succeeds.
- Rationale: prevents false projection of cancellation when workflow execution is still active.
Tests added/updated
- Updated
server/src/test/integration/workflowRuntimeV2.control.integration.test.ts:- New: Temporal resume hard-fails with explicit unsupported-action details and no wait/run mutation.
- New: Temporal retry hard-fails with explicit unsupported-action details and keeps failed run projection unchanged.
- New: Temporal cancel failure leaves run/waits in WAITING state and returns explicit error details.
- Added test mocking for Temporal cancel client call via:
vi.mock('@alga-psa/workflows/lib/workflowRuntimeV2Temporal', ...)
Commands / runbook used
- Focused integration run (full file):
cd server && npm run -s test -- src/test/integration/workflowRuntimeV2.control.integration.test.ts- Result: suite has a pre-existing unrelated failure (
STARTED invocation with stale lease is treated as TransientError) not introduced by these changes.
- Focused validation for new Temporal tests only:
cd server && npm run -s test -- src/test/integration/workflowRuntimeV2.control.integration.test.ts -t "Temporal runs reject legacy|Temporal cancel failure"- Result: passed.
Current gaps / next work
- Event-ingress correlation/authority items (
F018+) remain open. - Expression parity items referenced here were addressed in the next section (
Expression parity cutover).
Progress — 2026-04-09 (Expression parity cutover)
Completed scope in this pass
- Replaced Temporal workflow-local expression evaluation with canonical workflow expression contract in:
ee/temporal-workflows/src/workflows/workflow-runtime-v2-run-workflow.ts
- Removed local Temporal-only JSONata helpers from workflow execution path:
- removed local evaluator/normalizer and now uses canonical
compileExpressionfrom@alga-psa/workflows/runtime/expressionEngine
- removed local evaluator/normalizer and now uses canonical
- Removed Temporal-only
nowIso()control-flow ban.- Temporal workflow now accepts
nowIso()with canonical function set/validation behavior.
- Temporal workflow now accepts
Key decisions and rationale
- Decision: import expression contract from
@alga-psa/workflows/runtime/expressionEnginedirectly instead of@alga-psa/workflows/runtimebarrel.- Rationale: avoids pulling unrelated transitive runtime package entry dependencies into temporal workflow unit test bundle (
@alga-psa/storageresolution issue).
- Rationale: avoids pulling unrelated transitive runtime package entry dependencies into temporal workflow unit test bundle (
- Decision: cache compiled expressions per source string in-workflow (
Map<string, CompiledExpression>).- Rationale: preserves behavior and avoids repeated compile cost while remaining deterministic for replayed source strings.
Tests added/updated
- Updated
ee/temporal-workflows/src/workflows/__tests__/workflow-runtime-v2-run-workflow.test.ts:- nowIso parity:
control.ifacceptsnowIso()via canonical engine. - normalization parity:
==source normalization path works in Temporal execution. - disallowed-function parity: rejects unsupported functions with canonical validation.
- output guardrail parity: expression result over max size fails with canonical guardrail error.
- nowIso parity:
Commands / runbook used
- Temporal workflow unit suite (targeted):
cd ee/temporal-workflows && npx vitest run src/workflows/__tests__/workflow-runtime-v2-run-workflow.test.ts- Result: passed (30 tests).
Current gaps / next work
- Determinism replay proof item (
F009/T003) is still open:- need a dedicated replay-focused test that demonstrates no expression-contract drift across replay.
- Event ingress correlation and API/stream alignment (
F018+) remain open.
Progress — 2026-04-09 (Event-ingress authority guard for Temporal)
Completed scope in this pass
- Added Temporal guard in API event ingestion path (
submitWorkflowEventAction) to prevent legacy DB-authoritative resume for Temporal runs:- file:
ee/packages/workflows/src/actions/workflow-runtime-v2-actions.ts - behavior: if a matched candidate wait belongs to
engine='temporal', ingestion fails explicitly (409) and records an audit error message; no wait/run projection mutation is performed for that Temporal run.
- file:
Key decisions and rationale
- Decision: hard-fail API resume attempts for Temporal candidate runs in the legacy DB event-resume code path.
- Rationale: prevents
WorkflowRuntimeV2.executeRun(...)from being an authority path for Temporal runs in operator/API control surfaces.
- Rationale: prevents
Tests added/updated
- Updated
server/src/test/integration/workflowRuntimeV2.control.integration.test.ts:- New:
Submit workflow event rejects legacy DB-authoritative resume for Temporal runs - Asserts:
- explicit
409failure - wait remains
WAITING - run remains
WAITING - runtime event audit row includes Temporal-unsupported error metadata
- explicit
- New:
Commands / runbook used
- Focused server integration run for Temporal guard tests:
cd server && npm run -s test -- src/test/integration/workflowRuntimeV2.control.integration.test.ts -t "Temporal runs reject legacy|Temporal cancel failure|Submit workflow event rejects legacy DB-authoritative resume for Temporal runs"- Result: passed.
Progress — 2026-04-09 (Stream correlation contract)
Completed scope in this pass
- Extended workflow event stream schema/contracts to carry explicit workflow correlation:
packages/event-schemas/src/schemas/workflowEventSchema.tspackages/event-bus/src/schemas/workflowEventSchema.tspackages/event-schemas/src/schemas/eventBusSchema.ts(WorkflowPublishHooks.correlationKey,convertToWorkflowEventmapping)packages/event-bus/src/eventBus.ts(publishesworkflow_correlation_keyin stream payload)
- Updated stream ingestion worker correlation logic:
services/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.ts- correlation resolution order:
- explicit (
event.workflow_correlation_key,payload.workflowCorrelationKey,payload.correlationKey) - configured derivation paths from
WORKFLOW_RUNTIME_V2_EVENT_CORRELATION_PATHS_JSON(event-specific and*wildcard)
- explicit (
- no fallback to
event_idfor wait routing - resolved key persisted in
workflow_runtime_events.correlation_key - wait candidate lookup uses resolved key (
tenant + event_name + resolved correlation) - Temporal signal payload carries resolved correlation key
- unresolved correlation writes audit error metadata and skips wait routing/signaling
Key decisions and rationale
- Decision: use env-configured derivation paths as the immediate configurable source (
WORKFLOW_RUNTIME_V2_EVENT_CORRELATION_PATHS_JSON).- Rationale: provides explicit, event-type-specific derivation without introducing schema migrations in this remediation slice.
- Decision: fail open for event-triggered workflow launches but fail closed for wait routing when correlation is missing.
- Rationale: preserves event-triggered starts while preventing false-positive wait matches.
Tests added/updated
- Updated
services/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.test.ts:- explicit correlation routes candidate waits/signals by resolved key (not
event_id) - configured derivation from payload path routes correctly
- unresolved correlation writes audit error and skips wait-routing/signaling
- explicit correlation routes candidate waits/signals by resolved key (not
Commands / runbook used
- Stream worker unit tests:
cd services/workflow-worker && npx vitest run src/v2/WorkflowRuntimeV2EventStreamWorker.test.ts- Result: passed (4 tests).
Current gaps / next work
- API ingress still needs full correlation-derivation parity and Temporal-signal-only resume path alignment (
F024,F025,F026,F028,T010). - Determinism replay test (
F009/T003) and cutover regression/projection tests (T011,T012) remain open.
Progress — 2026-04-09 (Support posture documentation)
Completed scope in this pass
- Added hard-cutover support posture document:
ee/docs/plans/2026-04-08-workflow-v2-temporal-hard-cutover-remediation/HARD_CUTOVER_SUPPORT_POSTURE.md
- Document includes:
- Temporal authority model
- operator/API run-control support matrix (
cancelsupported,resume/retry/requeue_event_waitunsupported) - event-ingress correlation posture
- support/debugging guidance for projection fields
Progress — 2026-04-09 (API/stream correlation + Temporal signal-only ingress)
Completed scope in this pass
- Aligned API event ingress with stream correlation contract using a shared resolver:
- Added
ee/packages/workflows/src/lib/workflowEventCorrelation.ts - Updated stream worker (
services/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.ts) to use shared resolver - Updated API action schema to accept explicit workflow correlation while supporting derivation fallback:
ee/packages/workflows/src/actions/workflow-runtime-v2-schemas.ts
- Added
- Reworked
submitWorkflowEventActionto keep Temporal authority in Temporal:- file:
ee/packages/workflows/src/actions/workflow-runtime-v2-actions.ts - Temporal candidate waits are no longer DB-resolved/advanced.
- Temporal event waits are routed via
signalWorkflowRuntimeV2Event(...). - Temporal human waits are routed via new
signalWorkflowRuntimeV2HumanTask(...). - DB candidate lookup remains index/routing aid; payload-filter matching is no longer used as final authority for Temporal waits.
- Missing correlation writes clear runtime-event audit metadata and skips wait routing.
- file:
- Added Temporal human-task signal helper:
ee/packages/workflows/src/lib/workflowRuntimeV2Temporal.ts
Determinism / replay test scope
- Added replay/checkpoint determinism coverage for canonical expressions in Temporal workflow unit suite:
ee/temporal-workflows/src/workflows/__tests__/workflow-runtime-v2-run-workflow.test.ts- New test validates continue-as-new checkpoint replay path with canonical
control.ifexpression semantics.
Tests added/updated
server/src/test/integration/workflowRuntimeV2.control.integration.test.ts- Temporal API event ingress signals event waits without DB-authoritative mutation.
- Derived correlation (
WORKFLOW_RUNTIME_V2_EVENT_CORRELATION_PATHS_JSON) routes Temporal waits and avoidsexecuteRun. - Temporal payload-filter matching stays inside Temporal wait contract (API still signals candidate wait).
- Temporal human waits route through Temporal human-task signal.
- Resume/retry unsupported Temporal actions assert no
WorkflowRuntimeV2.executeRun(...)call.
services/workflow-worker/src/v2/WorkflowRuntimeV2EventStreamWorker.test.ts- remains passing with shared correlation helper integration.
Commands / runbook used
cd services/workflow-worker && npx vitest run src/v2/WorkflowRuntimeV2EventStreamWorker.test.tscd ee/temporal-workflows && npx vitest run src/workflows/__tests__/workflow-runtime-v2-run-workflow.test.tscd server && mkdir -p coverage/.tmp && npm run -s test -- --coverage.enabled=false src/test/integration/workflowRuntimeV2.control.integration.test.ts -t "Submit workflow event|Temporal runs reject legacy admin resume|Temporal runs reject legacy retry"
Plan checklist updates
- Marked implemented:
F009,F024,F025,F026,F028,F036 - Marked implemented:
T003,T010,T011,T012
Remaining open items after this pass
- Features:
F029,F030,F031,F032 - Tests: no remaining unchecked tests in this plan file.
Progress — 2026-04-09 (Legacy worker/runtime gating for Temporal runs)
Completed scope in this pass
- Gated legacy scheduler/runtime execution paths from affecting Temporal runs:
shared/workflow/workers/WorkflowRuntimeV2Worker.ts- due retry/time/event-time waits are now explicitly skipped when
run.engine === 'temporal'
- due retry/time/event-time waits are now explicitly skipped when
shared/workflow/runtime/runtime/workflowRuntimeV2.tsacquireRunnableRun(...)excludes Temporal runs from runnable acquisitionexecuteRun(...)now hard-fails for Temporal runs with explicit unsupported-runtime error
services/workflow-worker/src/v2/WorkflowRuntimeV2Worker.ts- mirrored Temporal wait skip guardrails for service worker code path
Projection/supportability impact
- Temporal projections are no longer mutated by legacy worker timeout/retry polling paths.
- Runtime event projections continue to persist resolved correlation and matched run metadata for Temporal signal routing.
- Unsupported Temporal run-control actions continue returning actionable operator metadata (
code/action/engine/runId).
Tests added/updated
- Updated
server/src/test/integration/workflowRuntimeV2.control.integration.test.ts:- Added
worker timeout polling does not execute Temporal runs through legacy runtime authority - Validates temporal wait remains
WAITING, run remainsWAITING, andWorkflowRuntimeV2.executeRun(...)is not invoked.
- Added
Commands / runbook used
cd server && mkdir -p coverage/.tmp && npm run -s test -- --coverage.enabled=false src/test/integration/workflowRuntimeV2.control.integration.test.ts -t "Submit workflow event|Temporal runs reject legacy admin resume|Temporal runs reject legacy retry|timeout polling"cd services/workflow-worker && npx vitest run src/v2/WorkflowRuntimeV2EventStreamWorker.test.ts src/index.startup.test.ts
Plan checklist updates
- Marked implemented:
F029,F030,F031,F032
Remaining open items after this pass
- Features: none
- Tests: none