Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
296 lines
16 KiB
Markdown
296 lines
16 KiB
Markdown
# SLA System Architecture
|
|
|
|
## Overview
|
|
|
|
The SLA system tracks ticket response and resolution times against configurable service level targets. It supports two execution backends:
|
|
|
|
- **PgBoss** (Community Edition) — Polling-based timer that checks active tickets every 5 minutes
|
|
- **Temporal** (Enterprise Edition) — Per-ticket durable workflows with precise threshold-based timers
|
|
|
|
Both backends use the same database tables and SLA services, providing identical business logic regardless of the timer engine. The backend is selected at runtime by `SlaBackendFactory` based on the edition flag.
|
|
|
|
## Key Features
|
|
|
|
- Two-phase SLA tracking (response + resolution) per ticket
|
|
- Business hours-aware deadline calculation with timezone and DST support
|
|
- Configurable notification thresholds (50%, 75%, 90%, 100%)
|
|
- 3-level escalation with automatic manager assignment
|
|
- Pause/resume with deadline shifting
|
|
- Edition-based backend selection (CE: PgBoss, EE: Temporal)
|
|
- Graceful fallback from Temporal to PgBoss
|
|
|
|
## Architecture
|
|
|
|
### Community Edition (PgBoss)
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Next.js Application │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ SlaBackendFactory │ │
|
|
│ │ Creates PgBossSlaBackend │ │
|
|
│ └────────────────────────┬────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ PgBossSlaBackend │ │
|
|
│ │ - start/cancel: no-op (polling handles lifecycle) │ │
|
|
│ │ - pause/resume/complete: delegates to service layer │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ slaTimerHandler (Job Scheduler) │ │
|
|
│ │ - Runs every 5 min via PgBoss cron │ │
|
|
│ │ - Queries active tickets with SLA tracking │ │
|
|
│ │ - Calculates elapsed SLA % using business hours │ │
|
|
│ │ - Publishes TICKET_SLA_THRESHOLD_REACHED events │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Enterprise Edition (Temporal)
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────┐
|
|
│ Next.js Application │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ SlaBackendFactory │ │
|
|
│ │ Creates TemporalSlaBackend │ │
|
|
│ └────────────────────────┬────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ TemporalSlaBackend │ │
|
|
│ │ - Starts sla-ticket-workflow per ticket │ │
|
|
│ │ - Sends signals: pause/resume/complete/cancel │ │
|
|
│ │ - Queries workflow state for real-time SLA status │ │
|
|
│ └────────────────────────┬────────────────────────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ Temporal Worker (separate process) │ │
|
|
│ │ Task queue: "sla-workflows" │ │
|
|
│ │ Workflow: slaTicketWorkflow │ │
|
|
│ │ Activities: calculate, notify, escalate, update, audit │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
└──────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## ISlaBackend Interface
|
|
|
|
The `ISlaBackend` interface abstracts timer operations so SLA services are edition-agnostic:
|
|
|
|
```typescript
|
|
interface ISlaBackend {
|
|
startSlaTracking(ticketId, policyId, targets, schedule, notificationThresholds?): Promise<void>;
|
|
pauseSla(ticketId, reason: SlaPauseReason): Promise<void>;
|
|
resumeSla(ticketId): Promise<void>;
|
|
completeSla(ticketId, type: 'response' | 'resolution', met: boolean): Promise<void>;
|
|
cancelSla(ticketId): Promise<void>;
|
|
getSlaStatus(ticketId): Promise<ISlaStatus | null>;
|
|
}
|
|
```
|
|
|
|
**PgBossSlaBackend** (CE): `startSlaTracking` and `cancelSla` are no-ops — the polling job handles timing. `pauseSla`/`resumeSla`/`completeSla` delegate to the service layer with `skipBackend: true` to prevent infinite recursion.
|
|
|
|
**TemporalSlaBackend** (EE): All methods map to Temporal operations — workflow start, signals, and queries. Falls back to PgBoss if the Temporal client cannot connect.
|
|
|
|
## SlaBackendFactory
|
|
|
|
Singleton factory that resolves the backend at runtime:
|
|
|
|
1. Checks `isEnterprise` flag from `@alga-psa/core/features`
|
|
2. If enterprise: dynamically imports `TemporalSlaBackend` from the EE package
|
|
3. On import failure or non-enterprise: falls back to `PgBossSlaBackend` with a warning log
|
|
4. Caches the resolved backend instance for the process lifetime
|
|
|
|
## Temporal Workflow Lifecycle (EE)
|
|
|
|
### Workflow: slaTicketWorkflow
|
|
|
|
Each ticket with an SLA policy gets one workflow instance. The workflow processes two sequential phases: **response** then **resolution**.
|
|
|
|
**Input:**
|
|
```typescript
|
|
interface SlaTicketWorkflowInput {
|
|
ticketId: string;
|
|
tenantId: string;
|
|
policyTargets: ISlaPolicyTarget[];
|
|
businessHoursSchedule: IBusinessHoursScheduleWithEntries;
|
|
notificationThresholds?: number[]; // e.g., [50, 75, 90] — 100 is always added
|
|
}
|
|
```
|
|
|
|
**Workflow ID format:** `sla-ticket-{tenantId}-{ticketId}`
|
|
|
|
**Phase loop:**
|
|
For each phase (response, resolution):
|
|
1. Look up `targetMinutes` from policy targets for the ticket's priority
|
|
2. If no target, skip the phase
|
|
3. For each threshold (sorted, always includes 100%):
|
|
a. Call `calculateNextWakeTime` activity to get the wall-clock deadline
|
|
b. Sleep until deadline (interruptible by signals)
|
|
c. If paused during sleep: skip to next threshold iteration
|
|
d. If cancelled/completed: exit phase
|
|
e. Send notification, check escalation, update status (at 100%)
|
|
4. Move to next phase
|
|
|
|
**Execution timeout:** 365 days. **Activity retry:** 3 attempts, 1s initial interval, 2x backoff, 30s max.
|
|
|
|
### Signals
|
|
|
|
| Signal | Payload | Effect |
|
|
|--------|---------|--------|
|
|
| `pause` | `{ reason: SlaPauseReason }` | Sets paused state, records pause start time |
|
|
| `resume` | (none) | Calculates pause duration, adds to `totalPauseMinutes`, unblocks |
|
|
| `completeResponse` | `{ met: boolean }` | Marks response phase complete, logs audit event |
|
|
| `completeResolution` | `{ met: boolean }` | Marks workflow completed, logs audit event |
|
|
| `cancel` | (none) | Terminates workflow |
|
|
|
|
### Query
|
|
|
|
| Query | Returns |
|
|
|-------|---------|
|
|
| `getState` | `SlaTicketWorkflowQueryResult` — current phase, status, pause state, deadlines, notified thresholds, remaining time in minutes |
|
|
|
|
### Activities
|
|
|
|
| Activity | Purpose |
|
|
|----------|---------|
|
|
| `calculateNextWakeTime` | Convert business-minute threshold to wall-clock UTC deadline using schedule + pause offset |
|
|
| `sendSlaNotification` | Publish `TICKET_SLA_THRESHOLD_REACHED` event to Redis stream |
|
|
| `checkAndEscalate` | Check escalation thresholds and trigger escalation if needed |
|
|
| `updateSlaStatus` | Mark `sla_response_met` / `sla_resolution_met` in tickets table (100% threshold only) |
|
|
| `recordSlaAuditLog` | Write event to `sla_audit_log` table |
|
|
|
|
## Event Bus Integration
|
|
|
|
```
|
|
Ticket Action
|
|
│
|
|
▼
|
|
Event Bus (Redis Stream)
|
|
│
|
|
├──► slaSubscriber
|
|
│ ├── TICKET_CREATED → startSlaForTicket()
|
|
│ ├── TICKET_UPDATED → handlePriorityChange() / handleStatusChange()
|
|
│ ├── TICKET_CLOSED → recordResolution()
|
|
│ ├── TICKET_COMMENT_ADDED → recordFirstResponse()
|
|
│ └── TICKET_RESPONSE_STATE_CHANGED → handleResponseStateChange()
|
|
│
|
|
└──► slaNotificationSubscriber
|
|
└── TICKET_SLA_THRESHOLD_REACHED → sendSlaNotification()
|
|
```
|
|
|
|
### Events Consumed
|
|
|
|
| Event | Source | Handler |
|
|
|-------|--------|---------|
|
|
| `TICKET_CREATED` | Ticket creation | Starts SLA tracking with resolved policy |
|
|
| `TICKET_UPDATED` | Status/priority/policy changes | Recalculates deadlines or pauses/resumes |
|
|
| `TICKET_CLOSED` | Ticket closure | Records resolution and SLA met/breached |
|
|
| `TICKET_COMMENT_ADDED` | New comment | Records first response (public, internal-user only) |
|
|
| `TICKET_RESPONSE_STATE_CHANGED` | Response state toggle | Pauses/resumes for awaiting_client |
|
|
|
|
### Events Produced
|
|
|
|
| Event | Source | Consumer |
|
|
|-------|--------|----------|
|
|
| `TICKET_SLA_THRESHOLD_REACHED` | Timer job (CE) or Temporal activity (EE) | `slaNotificationSubscriber` — sends in-app/email notifications |
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `NEXT_PUBLIC_EDITION` | `community` | Controls backend selection (`community` or `enterprise`) |
|
|
| `TEMPORAL_ADDRESS` | `temporal-frontend.temporal.svc.cluster.local:7233` | Temporal server address (EE only) |
|
|
| `TEMPORAL_NAMESPACE` | `default` | Temporal namespace (EE only) |
|
|
| `TEMPORAL_TASK_QUEUES` | `tenant-workflows,...,sla-workflows` | Comma-separated task queues for worker (EE only) |
|
|
|
|
### Feature Flags
|
|
|
|
- `isEnterprise` (from `@alga-psa/core/features`) — determines whether `SlaBackendFactory` attempts to load the Temporal backend
|
|
|
|
### SLA Timer Job (CE)
|
|
|
|
| Setting | Value |
|
|
|---------|-------|
|
|
| Job name | `sla-timer` |
|
|
| Schedule | `*/5 * * * *` (every 5 minutes) |
|
|
| Retry | 2 attempts |
|
|
| Timeout | 5 minutes |
|
|
|
|
## Key File Paths
|
|
|
|
### Core Package
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `packages/sla/src/types/index.ts` | All SLA type definitions |
|
|
| `packages/sla/src/services/slaService.ts` | SLA lifecycle (start, response, resolution) |
|
|
| `packages/sla/src/services/slaPauseService.ts` | Pause/resume with deadline shifting |
|
|
| `packages/sla/src/services/businessHoursCalculator.ts` | Timezone-aware time calculations |
|
|
| `packages/sla/src/services/slaNotificationService.ts` | Threshold notification delivery |
|
|
| `packages/sla/src/services/escalationService.ts` | 3-level escalation management |
|
|
| `packages/sla/src/services/itilSlaService.ts` | ITIL standard auto-configuration |
|
|
| `packages/sla/src/services/backends/ISlaBackend.ts` | Backend interface |
|
|
| `packages/sla/src/services/backends/PgBossSlaBackend.ts` | CE backend implementation |
|
|
| `packages/sla/src/services/backends/SlaBackendFactory.ts` | Backend factory (singleton) |
|
|
| `packages/sla/src/actions/` | Server actions (policy, schedule, pause, escalation, reporting) |
|
|
| `packages/sla/src/components/` | UI components (settings, badges, dashboard) |
|
|
|
|
### EE Temporal
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `ee/server/src/lib/sla/TemporalSlaBackend.ts` | EE Temporal backend (starts workflows, sends signals) |
|
|
| `ee/temporal-workflows/src/workflows/sla-ticket-workflow.ts` | Temporal workflow (2-phase, threshold-based) |
|
|
| `ee/temporal-workflows/src/activities/sla-activities.ts` | 5 activities (calculate, notify, escalate, update, audit) |
|
|
| `packages/ee/src/lib/sla/TemporalSlaBackend.ts` | CE stub (throws "enterprise only") |
|
|
|
|
### Server Integration
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `server/src/lib/eventBus/subscribers/slaSubscriber.ts` | Ticket event handlers for SLA lifecycle |
|
|
| `server/src/lib/eventBus/subscribers/slaNotificationSubscriber.ts` | Threshold notification dispatch |
|
|
| `server/src/lib/jobs/handlers/slaTimerHandler.ts` | CE polling job (every 5 min) |
|
|
| `server/src/app/msp/settings/sla/page.tsx` | Settings page (5 tabs) |
|
|
|
|
### Database
|
|
| File | Purpose |
|
|
|------|---------|
|
|
| `server/migrations/20260219000001_create_sla_policies.cjs` | Policies, targets, settings, pause config |
|
|
| `server/migrations/20260219000002_create_business_hours.cjs` | Schedules, entries, holidays |
|
|
| `server/migrations/20260219000003_add_board_manager_and_sla_notifications.cjs` | Board manager, notification thresholds, sent tracking |
|
|
| `server/migrations/20260219000004_add_sla_tracking_to_tickets.cjs` | Ticket SLA columns |
|
|
| `server/migrations/20260219000005_create_sla_audit_log.cjs` | Audit log |
|
|
| `server/migrations/20260219000006_add_sla_internal_notification_templates.cjs` | In-app notification templates |
|
|
| `server/migrations/20260219000007_add_sla_email_templates.cjs` | Email templates |
|
|
| `server/migrations/20260219000008_create_escalation_managers.cjs` | Escalation managers |
|
|
|
|
### Tests
|
|
| Location | Coverage |
|
|
|----------|----------|
|
|
| `packages/sla/src/services/__tests__/` | Business hours, SLA lifecycle, pause, escalation, notifications, backends |
|
|
| `ee/temporal-workflows/src/workflows/__tests__/` | Workflow logic, integration |
|
|
| `ee/temporal-workflows/src/activities/__tests__/` | Activity implementations |
|
|
| `server/src/test/integration/sla/` | 8 integration test suites |
|
|
| `server/src/test/unit/sla/` | 3 unit test suites (hierarchy, status resolver, time calculator) |
|
|
|
|
## Error Handling
|
|
|
|
- **Backend fallback**: If `TemporalSlaBackend` import fails (missing Temporal client, connection error), `SlaBackendFactory` silently falls back to `PgBossSlaBackend` and logs a warning
|
|
- **skipBackend flag**: `PgBossSlaBackend` calls service methods with `{ skipBackend: true }` to prevent infinite recursion between the backend and service layer
|
|
- **Idempotent workflow start**: `TemporalSlaBackend.startSlaTracking()` catches `WorkflowExecutionAlreadyStartedError` and returns gracefully
|
|
- **Ticket isolation**: The CE timer job processes tickets individually — a failure on one ticket does not block others
|
|
- **Activity retries**: Temporal activities use 3 attempts with exponential backoff (1s initial, 2x coefficient, 30s max)
|
|
|
|
## See Also
|
|
|
|
- [SLA Feature Documentation](../features/sla.md) — Business logic, database schema, and feature descriptions
|
|
- [Event System Architecture](./event_system.md) — Redis-based event streaming
|
|
- [Job Scheduler](./job_scheduler.md) — PgBoss/Temporal job system
|
|
- [Temporal Workflow PRD](../plans/2026-02-03-sla-temporal-workflow-architecture/PRD.md) — Original design document
|