Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

22 KiB
Raw Blame History

PRD — Live Ticket Updates

  • Slug: live-ticket-updates
  • Date: 2026-05-07
  • Status: Draft
  • Scope: Phases 13, tickets only (projects port deferred to a follow-up plan)

Summary

Add a real-time layer to the ticket detail page so that when one user saves a change, every other user viewing the same ticket sees it within ~500 ms — without requiring a refresh and without breaking when Hocuspocus is unavailable. Postgres remains the source of truth; Hocuspocus carries presence and lightweight "something changed" signals; the existing REST-based save model is unchanged.

Problem

Today, two users editing the same ticket cannot tell that the other is there, and a save by user B does not reach user A's screen until A reloads. The save flow already has good bones — optimistic per-field updates, a pendingRequestRef queue, and UnsavedChangesProvider — but no awareness of remote activity. The result is two visible failure modes:

  1. Stale reads: A is looking at a ticket whose status was changed 10 minutes ago by B; A makes decisions on stale data.
  2. Lost-update-style overwrites at the field level: A and B both change the same dropdown around the same time. The second save wins silently. Cross-entity validation in updateTicketWithCache (priority recompute, status↔board, etc.) means even non-overlapping fields can produce surprising server state.

Goals

  • A second user's save shows up on my screen for the ticket I'm viewing without a manual refresh, within ~500 ms of the save committing.
  • I can see who else is currently viewing this ticket (presence bar) and which structured field they are actively editing (focus indicator).
  • If two users edit the same field and remote save lands while I have unsaved local changes for that field, I get an explicit Keep yours / Take theirs banner — no silent overwrite.
  • If Hocuspocus is unreachable, the ticket page works exactly as it does today: reads, writes, optimistic updates, unsaved-changes warning all unchanged. The user sees a small "Live updates offline — reconnecting…" indicator; the live layer auto-reconnects in the background.
  • The implementation reuses the existing Hocuspocus pipe and the Redis pub/sub bridge pattern from NotificationExtension, so we are not introducing new infrastructure.

Non-goals

  • Collaborative editing of structured fields (no Y.Map of ticket fields). Postgres stays authoritative; we are not migrating the save path.
  • Live updates for the rich-text description field — already handled by the existing CollaborativeEditor Y.js pipeline; we will not touch it.
  • Live updates for comments, tags, resources, teams, time entries as separate channels in this plan. They will fold in later as additional invalidation signals on the same ticket room, but Phase 13 scope is the structured fields handled by updateTicketWithCache.
  • Live updates for projects. Same architecture later (separate plan), not now.
  • Hard locks on fields. The "X is editing" indicator is advisory only.
  • Background polling fallback. If Hocuspocus is down, we degrade — we do not periodically refetch via REST as a backstop.
  • Operational tooling beyond what's needed to verify the feature (no dashboards, no metrics pipelines, no admin UI).

Users and Primary Flows

Personas: MSP technicians and dispatchers viewing tickets concurrently. The most common collisions are around status, priority, assigned_to, and board.

Primary flows:

  1. Two users on one ticket, no conflict. A and B both have the ticket open. B changes status from "New" to "In Progress". A's status field updates silently within ~500 ms. A toast does not appear; the field is briefly highlighted.
  2. Presence. A opens the ticket; sees a stack of avatars in the header showing B is viewing. B closes the tab; A's avatar list shrinks within a few seconds.
  3. Focus indicator. A focuses the priority dropdown. On B's screen, the priority field is dimmed with a subtle "Alex is editing" caption. A blurs without changing — the indicator clears on B's screen.
  4. Same-field conflict. A opens the status dropdown and selects "On Hold" but has not yet committed. Meanwhile B saves status = "Resolved". A's local state still has "On Hold" pending. The field freezes, a banner appears: "Bob just changed status to Resolved (2 sec ago). [Keep yours] [Take theirs]". A clicks Keep yours → A's pending value is sent on next save and may overwrite B; A clicks Take theirs → A's local pending value is dropped, "Resolved" is shown.
  5. Hocuspocus down. Server is unreachable. A sees "Live updates offline — reconnecting…" indicator in the header. All saves and reads work normally. When the server comes back, the indicator disappears and the room rejoins.
  6. Reconnect. A's connection drops mid-session and reconnects after 30 s. On reconnect, the client refetches the ticket once to catch any updates that landed during the gap.

UX / UI Notes

  • Presence bar: lift the existing presence component out of packages/documents/src/components/CollaborativeEditor.tsx into packages/ui so tickets and (later) other entities share one component. Avatars + tooltip with name; visible in the ticket detail header next to the title.
  • Field focus indicator: visual differs by control type — dropdowns dim with a caption beneath; text inputs (title) show a small caption pill next to the control without dimming. No hard lock in either case — A can still focus the same field; we only show who else is.
  • Silent remote update: the changed field briefly highlights (~600 ms fade) when a remote update is applied. No toast unless the change touches data the user was looking at but hadn't saved.
  • Conflict banner: appears inside the field's container, not as a global toast. Two buttons: Keep yours (default focus), Take theirs. Shows author name + relative timestamp.
  • Connection status: small text indicator in the header. Three states: connected (no indicator), reconnecting ("Live updates offline — reconnecting…"), permanent failure ("Live updates unavailable" — after N retries, no further auto-retry that session).
  • Multi-tab same user: presence dedupes by userId, so A in two tabs shows once.

Requirements

Functional Requirements

FR-1. Server-side broadcast on ticket update. After updateTicketWithCache commits successfully and publishes its existing TICKET_UPDATED event, also publish a Redis message on channel ticket-updates:<tenantId>:<ticketId> with payload {updatedFields: string[], updatedBy: {userId, displayName}, updatedAt: ISO8601}. This must run in the same code path as the existing event publish (packages/tickets/src/actions/optimizedTicketActions.ts ~L20382101) so no save can succeed without the broadcast attempt. Broadcast failure is logged but does not fail the update.

FR-2. Hocuspocus extension bridges Redis to room broadcasts. A new TicketUpdatesExtension (modeled on NotificationExtension) subscribes to ticket-updates:*. When a message arrives for ticket-updates:<tenant>:<id>, it broadcasts a stateless message to all clients connected to room ticket:<tenant>:<id>.

FR-3. Per-ticket Hocuspocus room. tenantValidation.js recognizes the ticket: room prefix in addition to document: and notifications:. Authentication is via short-lived signed JWT (see Security).

FR-4. Client subscription on ticket open. When a user opens a ticket detail page, the client joins room ticket:<tenant>:<id> using createYjsProvider. The provider is empty Y.Doc — used only for awareness and stateless messages, not for field state.

FR-5. Presence. Awareness state holds {userId, displayName, avatarUrl, color, editingField?: string}. Presence bar renders all unique users (deduped by userId).

FR-6. Silent refetch on remote update with no local conflict. When the client receives an update message:

  • If the user has no pending local edits to any field in updatedFields, refetch the ticket and update component state. Briefly highlight changed fields. No toast.
  • Debounce refetches at 200 ms so a burst of changes triggers a single refetch.

FR-7. Conflict banner on same-field collision. When the client receives an update message that includes a field with pending unsaved local state:

  • Freeze that specific field.
  • Render a banner in the field's container with author + timestamp + remote value.
  • Keep yours: keeps local pending value; clears banner; user proceeds normally.
  • Take theirs: drops local pending value; refetches; field updates to remote value.

FR-8. Toast on remote update with non-overlapping unsaved local changes. If the user has unsaved changes on field X and a remote update lands on field Y:

  • Refetch and update Y silently.
  • Show a passing toast: "{Name} updated {field}".
  • Local pending changes on X are preserved untouched.

FR-9. Per-field editing indicator. When the user focuses an editable field (title, status, priority, ITIL impact, ITIL urgency, board, category, assignee, client, contact, location), set awareness.editingField = '<field>'. On blur or selection-commit, clear it. On other clients, render a "{Name} is editing" indicator on that field when at least one remote awareness has the same editingField. Visual treatment is per control type:

  • Dropdowns/selects: dim the control + caption beneath.
  • Text inputs (title): caption pill near the control; do not dim (would interfere with the input affordance).
  • No hard lock in either case.

FR-10. Soft dependency on Hocuspocus. All live behavior is layered over the existing REST save path. If the WebSocket fails to connect or disconnects:

  • Presence, focus indicator, silent refetch, conflict banner all become no-ops.
  • The ticket page renders, reads, writes, optimistic updates, and unsaved-changes warnings exactly as today.
  • A header indicator shows "Live updates offline — reconnecting…". Reconnects auto-retry with exponential backoff (start 1 s, cap 30 s). After 5 failed reconnects, switch to "Live updates unavailable" and stop auto-retrying that session (manual reload re-enables).

FR-11. On reconnect, refetch once. When the WebSocket reconnects after a drop, the client refetches the ticket exactly once to catch up on any updates that landed during the gap.

FR-12. Multi-tab dedupe. A single user with the same ticket open in multiple tabs appears once in the presence bar (deduped by userId).

FR-13. Permission revocation mid-session. If the server pushes a message indicating the current user no longer has access (e.g., ticket reassigned to a board they can't see), the client redirects away (or shows a "no access" view) instead of silently 403'ing on the next refetch. (Implementation note: refetch failure with 403 is the trigger; we do not need a separate channel for this in Phase 13.)

Non-functional Requirements

  • Latency: P95 from save commit to all subscribed clients applying the update ≤ 500 ms on the same data center.
  • Throughput: Phase 13 design target — up to 50 concurrent viewers per ticket and up to 100 ticket updates / sec / tenant. No formal load test in scope; just don't pick designs that obviously break here.
  • Auth: per-ticket JWT, ≤ 5 min expiry, signed with the existing Hocuspocus shared secret. Tenant + ticketId + userId encoded in claims.
  • Backward compatibility: zero changes required to existing ticket UI for users on the live layer to see fresh data. A user with Hocuspocus disabled in their environment sees the same UX as today plus the offline indicator.

Data / API / Integrations

Server (Next.js / @alga-psa/tickets):

  • New helper publishTicketUpdate({tenantId, ticketId, updatedFields, updatedBy, updatedAt}) in packages/tickets/src/lib/liveUpdates.ts. Uses getRedisClient() from @alga-psa/event-bus.
  • Call publishTicketUpdate in updateTicketWithCache (packages/tickets/src/actions/optimizedTicketActions.ts ~L20382101 region) after successful commit, alongside the existing publishEvent('TICKET_UPDATED', …) call. Compute updatedFields from the diff between the loaded ticket and the validated update payload (same diff already used implicitly for ITIL recompute / status↔board logic — extract the diff into a helper).
  • New endpoint GET /api/tickets/:id/live-token returns a short-lived JWT {tenantId, userId, ticketId, exp}. Wrapped in withAuth; checks assertTicketReadAllowed. Token signed with HOCUSPOCUS_JWT_SECRET (new env var; reuse existing Hocuspocus shared secret if one exists).

Hocuspocus (/hocuspocus):

  • New TicketUpdatesExtension.js modeled on NotificationExtension.js. Subscribes to <redisPrefix>ticket-updates:* (pattern subscribe). On message, looks up matching room and broadcasts a stateless message via Hocuspocus' sendStateless API (or sets a transient awareness key the clients listen for).
  • Extend tenantValidation.js:
    • Add parseTicketRoom(roomName) for ticket:<tenant>:<ticketId>.
    • Update validateDocumentRoomAccess to handle the ticket: prefix: parse, then verify the JWT in the request (query param token=<jwt>); reject if signature, expiry, tenant, or ticketId mismatch.
  • Register TicketUpdatesExtension in server.js extensions list.

Client (@alga-psa/tickets):

  • New packages/tickets/src/hooks/useTicketLive.ts: takes {tenantId, ticketId, currentUser, onRemoteUpdate, onPresenceChange}. Internally fetches the live-token, calls createYjsProvider('ticket:<tenant>:<id>', { token }), exposes presence, connectionStatus, setEditingField(field|null). Handles reconnect-then-refetch (FR-11).
  • New packages/tickets/src/components/ticket/TicketLiveProvider.tsx: wraps TicketDetails, owns the hook, exposes context.
  • Modify packages/tickets/src/components/ticket/TicketDetails.tsx (~L94171, L870924):
    • Subscribe to onRemoteUpdate from context.
    • Intersect updatedFields with the current pendingRequestRef queue and component-level dirty state to route to silent refetch / toast / conflict banner.
    • Wire setEditingField on focus/blur of structured field controls.
  • Lift presence bar from packages/documents/src/components/CollaborativeEditor.tsx (~L259305) into packages/ui/src/presence/PresenceBar.tsx. Update CollaborativeEditor to import from there. Tickets imports the same component.
  • Conflict banner component: packages/ui/src/presence/FieldConflictBanner.tsx. Used wherever a structured field needs the banner.

Wire format:

  • Redis channel: <redisPrefix>ticket-updates:<tenantId>:<ticketId>.
  • Redis payload (JSON): { updatedFields: string[], updatedBy: { userId: string, displayName: string }, updatedAt: string /* ISO */ }.
  • Hocuspocus stateless message payload from extension to clients: same JSON.
  • Awareness shape: { userId, displayName, avatarUrl?, color, editingField?: string }.

Security / Permissions

  • Per-ticket auth. tenantValidation.js currently only validates that the room's tenant matches the request's tenant. For tickets we additionally require:
    1. Client requests a JWT from /api/tickets/:id/live-token. Endpoint runs withAuthassertTicketReadAllowed(user, ticketId) (must check both tenant and ticket-level visibility — bundled-child sync locks, board visibility, client-portal restrictions).
    2. JWT is short-lived (≤ 5 min) and includes {tenantId, userId, ticketId, exp, iat, jti}.
    3. Hocuspocus onAuthenticate (or equivalent in validateDocumentRoomAccess) verifies the JWT, asserts tenantId === room.tenantId and ticketId === room.ticketId.
  • Token refresh. Client refreshes the token automatically before expiry (e.g., at 80 % of TTL). Refresh failure → degrade as if Hocuspocus were down.
  • Cross-tenant probe. A direct WebSocket connection with a token for tenant X attempting to join ticket:Y:* MUST be rejected by tenantValidation.js.
  • No PII in awareness. Awareness fields limited to userId, display name, avatar URL, color. No emails, no permissions snapshots.
  • No payload data on the wire. The Redis message and the broadcast carry only updatedFields (field names) and metadata. Clients refetch the ticket via the existing authenticated REST path to obtain values. This means access changes are enforced on every refetch — a user whose access was just revoked will see 403 from the refetch and trigger FR-13.
  • JWT signing key. New env var HOCUSPOCUS_JWT_SECRET (or reuse existing Hocuspocus secret if it covers signing). Stored in secrets/. Required in production; in dev, fall back to a fixed dev key with a startup warning.

Observability

Out of scope for this plan beyond what's necessary to verify behavior in development:

  • Console logging in TicketUpdatesExtension matching the pattern in NotificationExtension (subscribe/unsubscribe, message receipt) — useful for docker compose logs hocuspocus during dev.
  • Console logging on the client when reconnect attempts happen.

If formal metrics are required they will be added in a follow-up plan.

Rollout / Migration

  • Feature flag. Gate the live layer behind a PostHog feature flag live-ticket-updates (per project conventions in alga-feature-flags). Off by default. Roll out tenant-by-tenant.
  • Backwards compatible. No DB migrations. No schema changes. Existing REST flow is untouched — the new code is purely additive.
  • Backout. Disable the flag → all clients revert to today's behavior (REST only, no presence). The Redis publishes still happen but go nowhere; no harm. Optional kill-switch via env var on the server side to skip the publish entirely.

Implementation / Commit Cadence

Tests are listed at fine granularity in tests.json for tracking, but they MUST NOT be committed one-test-per-commit. Bundle commits by feature group within a phase. Specifically:

  • Phase 1 commits (target: 34 commits total):
    1. Server publish: F001 + F002 + F003 + F004 + F033 (Redis publish helper, diff helper, wire into updateTicketWithCache, bundled-child propagation, env kill-switch). Tests T001T007 ship in the same commit as the code they cover.
    2. Hocuspocus extension + auth: F005 + F006 + F007 + F008 + F009 + F010. Tests T008T021 ship in this commit.
    3. Client subscription + silent refetch + offline UX: F011 + F013 + F014 + F015 + F016 + F017 + F018 + F019 + F020 + F021 + F022 + F023 + F024 + F025 + F031 + F032. Tests T022T036, T044T046, T051T056 ship here. (PresenceBar lift in F011 is allowed to be its own commit if the documents-package regression test T024 wants isolation; otherwise fold in.)
  • Phase 2 commits (target: 12 commits): F029 + F030 + F034 + the editing-indicator wiring, with T040T043, T048, T058 in the same commits.
  • Phase 3 commits (target: 12 commits): F012 + F026 + F027 + F028 + the conflict-banner integration, with T037T039, T049 in the same commits.
  • Cross-phase E2E (T046, T047, T049, T050, T051, T052, T053, T059) ship in the commit that completes the relevant phase, not split across many.

Rule of thumb: a commit should leave main in a green state where the implemented features + their tests are both present. Do not split "code" and "tests" commits — they are reviewed together.

Open Questions

  1. Editing-indicator field set. Resolved 2026-05-07: title, status, priority, ITIL impact, ITIL urgency, board, category, assignee, client, contact, location. Title uses caption-pill variant; rest use dim+caption.
  2. Bundled child tickets. When a parent ticket sync-propagates to children (optimizedTicketActions.ts L21242143), should the child rooms also receive a broadcast? Probably yes; deferred to Phase 1 implementation note.
  3. Conflict-banner persistence. If the user dismisses a banner with Keep yours and then B saves the same field again, do we re-show the banner (probably yes) and is it cumulative? Decide during Phase 3 design.
  4. JWT secret. Reuse Hocuspocus' existing shared secret or introduce a separate one? Defer to security review.

Acceptance Criteria (Definition of Done)

Phase 1 — Server publish + silent client refetch (no presence, no conflict UI yet):

  • Two browsers, two users, one ticket: B saves status → A sees status update without reload within ~500 ms; A's other unsaved fields are preserved.
  • B saves while Hocuspocus is down: A does not get the live update; A's reads/writes still work; A sees offline indicator.
  • Cross-tenant probe: a user from tenant X with valid token-for-X cannot join ticket:Y:* (verified via direct WS probe in test).

Phase 2 — Presence + per-field editing indicator:

  • Two users open the same ticket: each sees the other in the presence bar within ~2 s.
  • A focuses status; B sees "Alex is editing" indicator on the status field. A blurs; indicator clears on B's screen.
  • A and B both focused on status simultaneously: each sees the other's indicator (no hard lock).
  • One user open in two tabs: presence shows once, not twice.

Phase 3 — Conflict banner:

  • A has unsaved status change pending. B saves a different status. A sees the banner with B's value + author + timestamp; the field is frozen until A clicks Keep yours or Take theirs.
  • A has unsaved change on field X; B saves field Y. A sees a passing toast; A's pending change on X is preserved; Y is silently updated.
  • Banner Keep yours: A's local pending value remains; A's next save sends it; banner clears.
  • Banner Take theirs: A's local pending is dropped; field reflects B's value; banner clears.

Always (regression guards):

  • Hocuspocus container killed: ticket page renders, all reads/writes via REST work, optimistic updates and unsaved-changes warning all behave as today; offline indicator shown.
  • Hocuspocus restarts: client reconnects within ~30 s, indicator disappears, refetches once on reconnect.
  • All existing ticket-related tests pass unchanged.