Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

14 KiB

Scratchpad — SLA Temporal Workflow Architecture (CE/EE Split)

  • Plan slug: sla-temporal-workflow-architecture
  • Created: 2026-02-03

What This Is

Keep a lightweight, continuously-updated log of discoveries and decisions made while implementing this plan.

Decisions

  • (2026-02-03) Use separate ISlaBackend interface rather than extending IJobRunner - SLA tracking has different semantics (signals, queries, per-ticket workflows) vs. generic job execution
  • (2026-02-03) Don't migrate existing tickets - new tickets get workflows (EE) or polling (CE), existing continue with polling
  • (2026-02-03) Database remains source of truth for SLA status - workflow is for timer orchestration only
  • (2026-02-03) Workflow ID format sla-ticket-{tenantId}-{ticketId} allows easy lookup and ensures uniqueness
  • (2026-02-03) Added ISlaBackend interface in packages/sla/src/services/backends/ISlaBackend.ts and exported via services index
  • (2026-02-03) Added SlaBackendFactory in packages/sla/src/services/backends/SlaBackendFactory.ts to select EE vs CE backend using server/src/lib/features.ts
  • (2026-02-03) Implemented PgBossSlaBackend to delegate pause/resume/complete/status operations via existing SLA services using tenant resolution helpers
  • (2026-02-03) Added sla-ticket-workflow.ts with response/resolution phase tracking and threshold orchestration (Temporal)
  • (2026-02-03) SLA workflow input includes ticketId, tenantId, policyTargets, and businessHoursSchedule
  • (2026-02-03) SLA workflow state tracks phase, pause state, notified thresholds, and response/resolution deadlines
  • (2026-02-03) SLA workflow uses Temporal sleep + condition race to wake at threshold times
  • (2026-02-03) Pause signal sets pauseStartedAt and pauses timers via condition wake-up
  • (2026-02-03) Resume signal clears pauseStartedAt, increments totalPauseMinutes, and triggers recalculation on next loop
  • (2026-02-03) Complete response signal sets response complete and transitions workflow to resolution phase
  • (2026-02-03) Complete resolution signal marks workflow completed and allows termination
  • (2026-02-03) Cancel signal marks workflow cancelled and stops further threshold handling
  • (2026-02-03) Added getState query returning status, remaining time minutes, and pause state
  • (2026-02-03) Added SLA activities including calculateNextWakeTime using business hours calculator
  • (2026-02-03) calculateNextWakeTime leverages calculateDeadline to advance into next business period when starting outside hours
  • (2026-02-03) calculateNextWakeTime adds accumulated pause minutes to computed deadline
  • (2026-02-03) sendSlaNotification activity delegates to slaNotificationService.sendSlaNotification
  • (2026-02-03) checkAndEscalate activity calls escalationService check + escalate paths
  • (2026-02-03) updateSlaStatus activity updates ticket SLA met fields on breach
  • (2026-02-03) recordSlaAuditLog activity writes entries to sla_audit_log
  • (2026-02-03) Implemented EE TemporalSlaBackend to start SLA ticket workflows via Temporal client
  • (2026-02-03) Temporal workflow IDs follow sla-ticket-{tenantId}-{ticketId} format in TemporalSlaBackend
  • (2026-02-03) TemporalSlaBackend pause/resume/complete/cancel methods signal workflows
  • (2026-02-03) TemporalSlaBackend resume signals resume to workflows
  • (2026-02-03) TemporalSlaBackend completes SLA phases via completeResponse/completeResolution signals
  • (2026-02-03) TemporalSlaBackend cancelSla sends cancel signal to workflow
  • (2026-02-03) TemporalSlaBackend getSlaStatus queries workflow state via getState
  • (2026-02-03) slaService.startSlaForTicket now triggers backend.startSlaTracking after recording SLA start
  • (2026-02-03) slaPauseService pause/resume now signal SLA backend unless skipBackend is set
  • (2026-02-03) slaPauseService.resumeSla now signals backend resume by default
  • (2026-02-03) slaService.recordFirstResponse now signals backend completeSla('response') unless skipped
  • (2026-02-03) slaService.recordResolution now signals backend completeSla('resolution') unless skipped
  • (2026-02-03) SlaBackendFactory falls back to PgBoss backend when Temporal backend load fails, with warning log
  • (2026-02-03) Fallback path logs warning via core logger
  • (2026-02-03) TemporalSlaBackend start is idempotent by ignoring duplicate workflow start errors
  • (2026-02-03) Ticket deletion actions now cancel SLA backend workflows via SlaBackendFactory
  • (2026-02-03) SLA policy change handling restarts backend workflows via handlePolicyChange in slaService and slaSubscriber
  • (2026-02-03) SLA workflow sends threshold notifications at 50/75/90% and breaches at 100%, with escalation checks
  • (2026-02-03) Temporal worker now includes sla-workflows task queue by default
  • (2026-02-03) SLA ticket workflow exported in Temporal workflows index for worker registration
  • (2026-02-03) SLA activities exported in Temporal activities index for worker registration
  • (2026-02-03) Added CE stub TemporalSlaBackend that throws Enterprise-only error
  • (2026-02-03) PgBossSlaBackend startSlaTracking remains a no-op for CE polling
  • (2026-02-03) PgBossSlaBackend cancelSla remains a no-op for CE polling
  • (2026-02-03) PgBossSlaBackend getSlaStatus delegates to slaService.getSlaStatus
  • (2026-02-03) SLA workflow respects 24x7 schedules via business hours calculator
  • (2026-02-03) Tests: added ISlaBackend interface signature test (T001)
  • (2026-02-03) Tests: SlaBackendFactory returns PgBoss in CE (T002)
  • (2026-02-03) Tests: SlaBackendFactory returns Temporal backend in EE when available (T003)
  • (2026-02-03) Tests: SlaBackendFactory falls back to PgBoss when Temporal unavailable (T004)
  • (2026-02-03) Tests: PgBossSlaBackend startSlaTracking no-op (T005)
  • (2026-02-03) Tests: PgBossSlaBackend.pauseSla delegates to slaPauseService (T006)
  • (2026-02-03) Tests: PgBossSlaBackend.resumeSla delegates to slaPauseService (T007)
  • (2026-02-03) Tests: PgBossSlaBackend.completeSla(response) delegates to slaService.recordFirstResponse (T008)
  • (2026-02-03) Tests: PgBossSlaBackend.completeSla(resolution) delegates to slaService.recordResolution (T009)
  • (2026-02-03) Tests: PgBossSlaBackend.cancelSla no-op (T010)
  • (2026-02-03) Tests: PgBossSlaBackend.getSlaStatus delegates to slaService (T011)
  • (2026-02-03) Tests: SLA ticket workflow initialization and input coverage (T012-T024)
  • (2026-02-03) Tests: SLA workflow initializes with input parameters (T012)
  • (2026-02-03) Tests: SLA workflow initial state phase and thresholds (T013)
  • (2026-02-03) Tests: SLA workflow threshold calculations include 50% (T014)
  • (2026-02-03) Tests: SLA workflow threshold calculations include 75% (T015)
  • (2026-02-03) Tests: SLA workflow threshold calculations include 90% (T016)
  • (2026-02-03) Tests: SLA workflow threshold calculations include 100% (T017)
  • (2026-02-03) Tests: SLA workflow pause signal sets pauseStartedAt (T018)
  • (2026-02-03) Tests: SLA workflow resume clears pauseStartedAt and increments total pause minutes (T019)
  • (2026-02-03) Tests: SLA workflow recalculates wake time after resume with pause minutes (T020)
  • (2026-02-03) Tests: SLA workflow transitions to resolution on completeResponse (T021)
  • (2026-02-03) Tests: SLA workflow terminates on completeResolution (T022)
  • (2026-02-03) Tests: SLA workflow cancel signal terminates workflow (T023)
  • (2026-02-03) Tests: SLA workflow getState query returns status/remaining time (T024)
  • (2026-02-03) Tests: calculateNextWakeTime weekday schedule (T025)
  • (2026-02-03) Tests: calculateNextWakeTime advances across weekend (T026)
  • (2026-02-03) Tests: calculateNextWakeTime skips holidays (T027)
  • (2026-02-03) Tests: calculateNextWakeTime handles recurring holidays (T028)
  • (2026-02-03) Tests: calculateNextWakeTime accounts for pause minutes (T029)
  • (2026-02-03) Tests: calculateNextWakeTime for 24x7 schedule (T030)
  • (2026-02-03) Tests: sendSlaNotification activity calls notification service (T031)
  • (2026-02-03) Tests: checkAndEscalate activity calls escalation check (T032)
  • (2026-02-03) Tests: checkAndEscalate triggers escalation when needed (T033)
  • (2026-02-03) Tests: updateSlaStatus updates response met field (T034)
  • (2026-02-03) Tests: updateSlaStatus updates resolution met field (T035)
  • (2026-02-03) Tests: recordSlaAuditLog inserts audit entry (T036)
  • (2026-02-03) Tests: TemporalSlaBackend starts SLA workflow with correct ID (T037)
  • (2026-02-03) Tests: TemporalSlaBackend workflow ID format validated (T038)
  • (2026-02-03) Tests: TemporalSlaBackend.pauseSla sends pause signal (T039)
  • (2026-02-03) Tests: TemporalSlaBackend.resumeSla sends resume signal (T040)
  • (2026-02-03) Tests: TemporalSlaBackend.completeSla(response) signals workflow (T041)
  • (2026-02-03) Tests: TemporalSlaBackend.completeSla(resolution) signals workflow (T042)
  • (2026-02-03) Tests: TemporalSlaBackend.cancelSla sends cancel signal (T043)
  • (2026-02-03) Tests: TemporalSlaBackend.getSlaStatus queries workflow state (T044)
  • (2026-02-03) Tests: slaService.startSlaForTicket calls backend.startSlaTracking (T045)
  • (2026-02-03) Tests: slaPauseService.pauseSla calls backend.pauseSla (T046)
  • (2026-02-03) Tests: slaPauseService.resumeSla calls backend.resumeSla (T047)
  • (2026-02-03) Tests: slaService.recordFirstResponse calls backend.completeSla(response) (T048)
  • (2026-02-03) Tests: slaService.recordResolution calls backend.completeSla(resolution) (T049)
  • (2026-02-03) Tests: TemporalSlaBackend handles duplicate workflow ID (T050)
  • (2026-02-03) Tests: CE TemporalSlaBackend stub throws Enterprise-only error (T051)

Discoveries / Constraints

  • (2026-02-03) Current sla-timer pgboss job runs every 5 minutes per tenant
  • (2026-02-03) Business hours calculator uses minute-by-minute iteration - inefficient for long periods
  • (2026-02-03) Existing JobRunnerFactory pattern can be reused for SlaBackendFactory
  • (2026-02-03) isEnterprise check in server/src/lib/features.ts is the edition detection source
  • (2026-02-03) Temporal worker already supports multiple task queues - can add sla-workflows queue
  • (2026-02-03) Generic job workflow pattern shows how to structure activities and signals
  • (2026-02-03) TIMEZONE BUG INVESTIGATION NEEDED - User reported mismatch in time calculations

Commands / Runbooks

  • Run SLA tests: cd packages/sla && npm test
  • Run Temporal worker locally: docker compose -f docker-compose.temporal.ee.yaml up
  • Check Temporal UI: http://localhost:8088
  • Current SLA services: packages/sla/src/services/
  • Business hours calculator: packages/sla/src/services/businessHoursCalculator.ts
  • SLA timer handler: server/src/lib/jobs/handlers/slaTimerHandler.ts
  • Temporal workflows: ee/temporal-workflows/src/workflows/
  • Job runner factory: server/src/lib/jobs/JobRunnerFactory.ts
  • Edition detection: server/src/lib/features.ts
  • Generic job workflow pattern: ee/temporal-workflows/src/workflows/generic-job-workflow.ts

Open Questions

  • What specific timezone issues have been observed? Need to investigate and document
  • Should workflow query replace database reads for real-time SLA status display?
  • How to handle Temporal worker scaling for high-volume tenants?

Progress Log

  • (2026-02-03) Added integration coverage for EE ticket start triggering Temporal workflow in packages/sla/src/services/__tests__/slaBackendIntegration.test.ts (T052).
  • (2026-02-03) Added Temporal workflow integration test covering 50% notification threshold in ee/temporal-workflows/src/workflows/__tests__/sla-ticket-workflow.integration.test.ts (T053).
  • (2026-02-03) Validated 75% SLA threshold notification via workflow integration test (T054).
  • (2026-02-03) Validated 90% SLA threshold notification via workflow integration test (T055).
  • (2026-02-03) Confirmed 100% threshold breach update via workflow integration test (T056).
  • (2026-02-03) Verified escalation checks at each SLA threshold via workflow integration test (T057).
  • (2026-02-03) Added integration coverage confirming pause stops SLA notifications until resume (T058).
  • (2026-02-03) Verified resume recalculates timers with pause minutes in workflow integration test (T059).
  • (2026-02-03) Confirmed completeResponse transitions workflow into resolution phase in integration lifecycle test (T060).
  • (2026-02-03) Verified completeResolution terminates the workflow in integration lifecycle test (T061).
  • (2026-02-03) Added ticket deletion test ensuring SLA workflow cancellation in packages/tickets/src/actions/__tests__/ticketActions.sla.test.ts (T062).
  • (2026-02-03) Confirmed policy change cancels and restarts SLA backend in integration test (T063).
  • (2026-02-03) Verified CE ticket start avoids Temporal backend in integration test (T064).
  • (2026-02-03) Added slaTimerHandler integration test confirming polling path still processes tickets (T065).
  • (2026-02-03) Added worker registration test validating workflow export for Temporal startup (T066).
  • (2026-02-03) Confirmed SLA activities are exported for worker startup (T067).
  • (2026-02-03) Added full EE SLA lifecycle workflow integration test with pause/resume and resolution notifications (T068).
  • (2026-02-03) Added CE lifecycle test covering create, poll notification, pause/resume, response, and resolution in packages/sla/src/services/__tests__/slaCeLifecycle.test.ts (T069).
  • (2026-02-03) Verified EE fallback to PgBoss backend when Temporal is unavailable in integration test (T070).
  • (2026-02-03) Marked America/New_York timezone calculation test for calculateNextWakeTime (T071).
  • (2026-02-03) Marked Europe/London timezone calculation test for calculateNextWakeTime (T072).
  • (2026-02-03) Marked DST transition coverage for calculateNextWakeTime (T073).
  • (2026-02-03) Added worker restart integration test to validate workflow replay determinism (T074).
  • (2026-02-03) Verified workflow continues after worker restart in integration test (T075).