Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

211 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PRD — Live Ticket Updates
- Slug: `live-ticket-updates`
- Date: `2026-05-07`
- Status: Draft
- Scope: Phases 13, **tickets only** (projects port deferred to a follow-up plan)
## Summary
Add a real-time layer to the ticket detail page so that when one user saves a change, every other user viewing the same ticket sees it within ~500 ms — without requiring a refresh and without breaking when Hocuspocus is unavailable. Postgres remains the source of truth; Hocuspocus carries presence and lightweight "something changed" signals; the existing REST-based save model is unchanged.
## Problem
Today, two users editing the same ticket cannot tell that the other is there, and a save by user B does not reach user A's screen until A reloads. The save flow already has good bones — optimistic per-field updates, a `pendingRequestRef` queue, and `UnsavedChangesProvider` — but no awareness of remote activity. The result is two visible failure modes:
1. **Stale reads**: A is looking at a ticket whose status was changed 10 minutes ago by B; A makes decisions on stale data.
2. **Lost-update-style overwrites at the field level**: A and B both change the same dropdown around the same time. The second save wins silently. Cross-entity validation in `updateTicketWithCache` (priority recompute, status↔board, etc.) means even non-overlapping fields can produce surprising server state.
## Goals
- A second user's save shows up on my screen for the ticket I'm viewing without a manual refresh, within ~500 ms of the save committing.
- I can see who else is currently viewing this ticket (presence bar) and which structured field they are actively editing (focus indicator).
- If two users edit the same field and remote save lands while I have unsaved local changes for that field, I get an explicit **Keep yours / Take theirs** banner — no silent overwrite.
- If Hocuspocus is unreachable, the ticket page works exactly as it does today: reads, writes, optimistic updates, unsaved-changes warning all unchanged. The user sees a small "Live updates offline — reconnecting…" indicator; the live layer auto-reconnects in the background.
- The implementation reuses the existing Hocuspocus pipe and the Redis pub/sub bridge pattern from `NotificationExtension`, so we are not introducing new infrastructure.
## Non-goals
- Collaborative editing of structured fields (no Y.Map of ticket fields). Postgres stays authoritative; we are not migrating the save path.
- Live updates for the rich-text **description** field — already handled by the existing `CollaborativeEditor` Y.js pipeline; we will not touch it.
- Live updates for **comments**, **tags**, **resources**, **teams**, **time entries** as separate channels in this plan. They will fold in later as additional invalidation signals on the same ticket room, but Phase 13 scope is the structured fields handled by `updateTicketWithCache`.
- Live updates for **projects**. Same architecture later (separate plan), not now.
- Hard locks on fields. The "X is editing" indicator is advisory only.
- Background polling fallback. If Hocuspocus is down, we degrade — we do not periodically refetch via REST as a backstop.
- Operational tooling beyond what's needed to verify the feature (no dashboards, no metrics pipelines, no admin UI).
## Users and Primary Flows
**Personas:** MSP technicians and dispatchers viewing tickets concurrently. The most common collisions are around status, priority, assigned_to, and board.
**Primary flows:**
1. **Two users on one ticket, no conflict.** A and B both have the ticket open. B changes status from "New" to "In Progress". A's status field updates silently within ~500 ms. A toast does not appear; the field is briefly highlighted.
2. **Presence.** A opens the ticket; sees a stack of avatars in the header showing B is viewing. B closes the tab; A's avatar list shrinks within a few seconds.
3. **Focus indicator.** A focuses the priority dropdown. On B's screen, the priority field is dimmed with a subtle "Alex is editing" caption. A blurs without changing — the indicator clears on B's screen.
4. **Same-field conflict.** A opens the status dropdown and selects "On Hold" but has not yet committed. Meanwhile B saves status = "Resolved". A's local state still has "On Hold" pending. The field freezes, a banner appears: "Bob just changed status to Resolved (2 sec ago). [Keep yours] [Take theirs]". A clicks Keep yours → A's pending value is sent on next save and may overwrite B; A clicks Take theirs → A's local pending value is dropped, "Resolved" is shown.
5. **Hocuspocus down.** Server is unreachable. A sees "Live updates offline — reconnecting…" indicator in the header. All saves and reads work normally. When the server comes back, the indicator disappears and the room rejoins.
6. **Reconnect.** A's connection drops mid-session and reconnects after 30 s. On reconnect, the client refetches the ticket once to catch any updates that landed during the gap.
## UX / UI Notes
- **Presence bar**: lift the existing presence component out of `packages/documents/src/components/CollaborativeEditor.tsx` into `packages/ui` so tickets and (later) other entities share one component. Avatars + tooltip with name; visible in the ticket detail header next to the title.
- **Field focus indicator**: visual differs by control type — dropdowns dim with a caption beneath; text inputs (title) show a small caption pill next to the control without dimming. No hard lock in either case — A can still focus the same field; we only show *who* else is.
- **Silent remote update**: the changed field briefly highlights (~600 ms fade) when a remote update is applied. No toast unless the change touches data the user was looking at but hadn't saved.
- **Conflict banner**: appears inside the field's container, not as a global toast. Two buttons: **Keep yours** (default focus), **Take theirs**. Shows author name + relative timestamp.
- **Connection status**: small text indicator in the header. Three states: connected (no indicator), reconnecting ("Live updates offline — reconnecting…"), permanent failure ("Live updates unavailable" — after N retries, no further auto-retry that session).
- **Multi-tab same user**: presence dedupes by `userId`, so A in two tabs shows once.
## Requirements
### Functional Requirements
**FR-1. Server-side broadcast on ticket update.** After `updateTicketWithCache` commits successfully and publishes its existing `TICKET_UPDATED` event, also publish a Redis message on channel `ticket-updates:<tenantId>:<ticketId>` with payload `{updatedFields: string[], updatedBy: {userId, displayName}, updatedAt: ISO8601}`. This must run in the same code path as the existing event publish (`packages/tickets/src/actions/optimizedTicketActions.ts` ~L20382101) so no save can succeed without the broadcast attempt. Broadcast failure is logged but does not fail the update.
**FR-2. Hocuspocus extension bridges Redis to room broadcasts.** A new `TicketUpdatesExtension` (modeled on `NotificationExtension`) subscribes to `ticket-updates:*`. When a message arrives for `ticket-updates:<tenant>:<id>`, it broadcasts a stateless message to all clients connected to room `ticket:<tenant>:<id>`.
**FR-3. Per-ticket Hocuspocus room.** `tenantValidation.js` recognizes the `ticket:` room prefix in addition to `document:` and `notifications:`. Authentication is via short-lived signed JWT (see Security).
**FR-4. Client subscription on ticket open.** When a user opens a ticket detail page, the client joins room `ticket:<tenant>:<id>` using `createYjsProvider`. The provider is empty Y.Doc — used only for awareness and stateless messages, not for field state.
**FR-5. Presence.** Awareness state holds `{userId, displayName, avatarUrl, color, editingField?: string}`. Presence bar renders all unique users (deduped by `userId`).
**FR-6. Silent refetch on remote update with no local conflict.** When the client receives an update message:
- If the user has no pending local edits to any field in `updatedFields`, refetch the ticket and update component state. Briefly highlight changed fields. No toast.
- Debounce refetches at 200 ms so a burst of changes triggers a single refetch.
**FR-7. Conflict banner on same-field collision.** When the client receives an update message that includes a field with pending unsaved local state:
- Freeze that specific field.
- Render a banner in the field's container with author + timestamp + remote value.
- **Keep yours**: keeps local pending value; clears banner; user proceeds normally.
- **Take theirs**: drops local pending value; refetches; field updates to remote value.
**FR-8. Toast on remote update with non-overlapping unsaved local changes.** If the user has unsaved changes on field X and a remote update lands on field Y:
- Refetch and update Y silently.
- Show a passing toast: "{Name} updated {field}".
- Local pending changes on X are preserved untouched.
**FR-9. Per-field editing indicator.** When the user focuses an editable field (title, status, priority, ITIL impact, ITIL urgency, board, category, assignee, client, contact, location), set `awareness.editingField = '<field>'`. On blur or selection-commit, clear it. On other clients, render a "{Name} is editing" indicator on that field when at least one remote awareness has the same `editingField`. Visual treatment is per control type:
- Dropdowns/selects: dim the control + caption beneath.
- Text inputs (title): caption pill near the control; do not dim (would interfere with the input affordance).
- No hard lock in either case.
**FR-10. Soft dependency on Hocuspocus.** All live behavior is layered over the existing REST save path. If the WebSocket fails to connect or disconnects:
- Presence, focus indicator, silent refetch, conflict banner all become no-ops.
- The ticket page renders, reads, writes, optimistic updates, and unsaved-changes warnings exactly as today.
- A header indicator shows "Live updates offline — reconnecting…". Reconnects auto-retry with exponential backoff (start 1 s, cap 30 s). After 5 failed reconnects, switch to "Live updates unavailable" and stop auto-retrying that session (manual reload re-enables).
**FR-11. On reconnect, refetch once.** When the WebSocket reconnects after a drop, the client refetches the ticket exactly once to catch up on any updates that landed during the gap.
**FR-12. Multi-tab dedupe.** A single user with the same ticket open in multiple tabs appears once in the presence bar (deduped by `userId`).
**FR-13. Permission revocation mid-session.** If the server pushes a message indicating the current user no longer has access (e.g., ticket reassigned to a board they can't see), the client redirects away (or shows a "no access" view) instead of silently 403'ing on the next refetch. (Implementation note: refetch failure with 403 is the trigger; we do not need a separate channel for this in Phase 13.)
### Non-functional Requirements
- **Latency**: P95 from save commit to all subscribed clients applying the update ≤ 500 ms on the same data center.
- **Throughput**: Phase 13 design target — up to 50 concurrent viewers per ticket and up to 100 ticket updates / sec / tenant. No formal load test in scope; just don't pick designs that obviously break here.
- **Auth**: per-ticket JWT, ≤ 5 min expiry, signed with the existing Hocuspocus shared secret. Tenant + ticketId + userId encoded in claims.
- **Backward compatibility**: zero changes required to existing ticket UI for users on the live layer to see fresh data. A user with Hocuspocus disabled in their environment sees the same UX as today plus the offline indicator.
## Data / API / Integrations
**Server (Next.js / `@alga-psa/tickets`):**
- New helper `publishTicketUpdate({tenantId, ticketId, updatedFields, updatedBy, updatedAt})` in `packages/tickets/src/lib/liveUpdates.ts`. Uses `getRedisClient()` from `@alga-psa/event-bus`.
- Call `publishTicketUpdate` in `updateTicketWithCache` (`packages/tickets/src/actions/optimizedTicketActions.ts` ~L20382101 region) after successful commit, alongside the existing `publishEvent('TICKET_UPDATED', …)` call. Compute `updatedFields` from the diff between the loaded ticket and the validated update payload (same diff already used implicitly for ITIL recompute / status↔board logic — extract the diff into a helper).
- New endpoint `GET /api/tickets/:id/live-token` returns a short-lived JWT `{tenantId, userId, ticketId, exp}`. Wrapped in `withAuth`; checks `assertTicketReadAllowed`. Token signed with `HOCUSPOCUS_JWT_SECRET` (new env var; reuse existing Hocuspocus shared secret if one exists).
**Hocuspocus (`/hocuspocus`):**
- New `TicketUpdatesExtension.js` modeled on `NotificationExtension.js`. Subscribes to `<redisPrefix>ticket-updates:*` (pattern subscribe). On message, looks up matching room and broadcasts a stateless message via Hocuspocus' `sendStateless` API (or sets a transient awareness key the clients listen for).
- Extend `tenantValidation.js`:
- Add `parseTicketRoom(roomName)` for `ticket:<tenant>:<ticketId>`.
- Update `validateDocumentRoomAccess` to handle the `ticket:` prefix: parse, then verify the JWT in the request (query param `token=<jwt>`); reject if signature, expiry, tenant, or ticketId mismatch.
- Register `TicketUpdatesExtension` in `server.js` extensions list.
**Client (`@alga-psa/tickets`):**
- New `packages/tickets/src/hooks/useTicketLive.ts`: takes `{tenantId, ticketId, currentUser, onRemoteUpdate, onPresenceChange}`. Internally fetches the live-token, calls `createYjsProvider('ticket:<tenant>:<id>', { token })`, exposes `presence`, `connectionStatus`, `setEditingField(field|null)`. Handles reconnect-then-refetch (FR-11).
- New `packages/tickets/src/components/ticket/TicketLiveProvider.tsx`: wraps `TicketDetails`, owns the hook, exposes context.
- Modify `packages/tickets/src/components/ticket/TicketDetails.tsx` (~L94171, L870924):
- Subscribe to `onRemoteUpdate` from context.
- Intersect `updatedFields` with the current `pendingRequestRef` queue and component-level dirty state to route to silent refetch / toast / conflict banner.
- Wire `setEditingField` on focus/blur of structured field controls.
- Lift presence bar from `packages/documents/src/components/CollaborativeEditor.tsx` (~L259305) into `packages/ui/src/presence/PresenceBar.tsx`. Update `CollaborativeEditor` to import from there. Tickets imports the same component.
- Conflict banner component: `packages/ui/src/presence/FieldConflictBanner.tsx`. Used wherever a structured field needs the banner.
**Wire format:**
- Redis channel: `<redisPrefix>ticket-updates:<tenantId>:<ticketId>`.
- Redis payload (JSON): `{ updatedFields: string[], updatedBy: { userId: string, displayName: string }, updatedAt: string /* ISO */ }`.
- Hocuspocus stateless message payload from extension to clients: same JSON.
- Awareness shape: `{ userId, displayName, avatarUrl?, color, editingField?: string }`.
## Security / Permissions
- **Per-ticket auth.** `tenantValidation.js` currently only validates that the room's tenant matches the request's tenant. For tickets we additionally require:
1. Client requests a JWT from `/api/tickets/:id/live-token`. Endpoint runs `withAuth``assertTicketReadAllowed(user, ticketId)` (must check both tenant and ticket-level visibility — bundled-child sync locks, board visibility, client-portal restrictions).
2. JWT is short-lived (≤ 5 min) and includes `{tenantId, userId, ticketId, exp, iat, jti}`.
3. Hocuspocus `onAuthenticate` (or equivalent in `validateDocumentRoomAccess`) verifies the JWT, asserts `tenantId === room.tenantId` and `ticketId === room.ticketId`.
- **Token refresh.** Client refreshes the token automatically before expiry (e.g., at 80 % of TTL). Refresh failure → degrade as if Hocuspocus were down.
- **Cross-tenant probe.** A direct WebSocket connection with a token for tenant X attempting to join `ticket:Y:*` MUST be rejected by `tenantValidation.js`.
- **No PII in awareness.** Awareness fields limited to userId, display name, avatar URL, color. No emails, no permissions snapshots.
- **No payload data on the wire.** The Redis message and the broadcast carry only `updatedFields` (field names) and metadata. Clients refetch the ticket via the existing authenticated REST path to obtain values. This means access changes are enforced on every refetch — a user whose access was just revoked will see 403 from the refetch and trigger FR-13.
- **JWT signing key.** New env var `HOCUSPOCUS_JWT_SECRET` (or reuse existing Hocuspocus secret if it covers signing). Stored in `secrets/`. Required in production; in dev, fall back to a fixed dev key with a startup warning.
## Observability
Out of scope for this plan beyond what's necessary to verify behavior in development:
- Console logging in `TicketUpdatesExtension` matching the pattern in `NotificationExtension` (subscribe/unsubscribe, message receipt) — useful for `docker compose logs hocuspocus` during dev.
- Console logging on the client when reconnect attempts happen.
If formal metrics are required they will be added in a follow-up plan.
## Rollout / Migration
- **Feature flag.** Gate the live layer behind a PostHog feature flag `live-ticket-updates` (per project conventions in `alga-feature-flags`). Off by default. Roll out tenant-by-tenant.
- **Backwards compatible.** No DB migrations. No schema changes. Existing REST flow is untouched — the new code is purely additive.
- **Backout.** Disable the flag → all clients revert to today's behavior (REST only, no presence). The Redis publishes still happen but go nowhere; no harm. Optional kill-switch via env var on the server side to skip the publish entirely.
### Implementation / Commit Cadence
Tests are listed at fine granularity in `tests.json` for tracking, but they MUST NOT be committed one-test-per-commit. Bundle commits by **feature group** within a phase. Specifically:
- **Phase 1 commits** (target: 34 commits total):
1. Server publish: F001 + F002 + F003 + F004 + F033 (Redis publish helper, diff helper, wire into `updateTicketWithCache`, bundled-child propagation, env kill-switch). Tests T001T007 ship in **the same commit** as the code they cover.
2. Hocuspocus extension + auth: F005 + F006 + F007 + F008 + F009 + F010. Tests T008T021 ship in this commit.
3. Client subscription + silent refetch + offline UX: F011 + F013 + F014 + F015 + F016 + F017 + F018 + F019 + F020 + F021 + F022 + F023 + F024 + F025 + F031 + F032. Tests T022T036, T044T046, T051T056 ship here. (`PresenceBar` lift in F011 is allowed to be its own commit if the documents-package regression test T024 wants isolation; otherwise fold in.)
- **Phase 2 commits** (target: 12 commits): F029 + F030 + F034 + the editing-indicator wiring, with T040T043, T048, T058 in the same commits.
- **Phase 3 commits** (target: 12 commits): F012 + F026 + F027 + F028 + the conflict-banner integration, with T037T039, T049 in the same commits.
- **Cross-phase E2E** (T046, T047, T049, T050, T051, T052, T053, T059) ship in the commit that completes the relevant phase, not split across many.
Rule of thumb: a commit should leave `main` in a green state where the implemented features + their tests are both present. Do not split "code" and "tests" commits — they are reviewed together.
## Open Questions
1. ~~**Editing-indicator field set.**~~ **Resolved 2026-05-07:** title, status, priority, ITIL impact, ITIL urgency, board, category, assignee, client, contact, location. Title uses caption-pill variant; rest use dim+caption.
2. **Bundled child tickets.** When a parent ticket sync-propagates to children (`optimizedTicketActions.ts` L21242143), should the child rooms also receive a broadcast? Probably yes; deferred to Phase 1 implementation note.
3. **Conflict-banner persistence.** If the user dismisses a banner with **Keep yours** and *then* B saves the same field again, do we re-show the banner (probably yes) and is it cumulative? Decide during Phase 3 design.
4. **JWT secret.** Reuse Hocuspocus' existing shared secret or introduce a separate one? Defer to security review.
## Acceptance Criteria (Definition of Done)
**Phase 1 — Server publish + silent client refetch (no presence, no conflict UI yet):**
- Two browsers, two users, one ticket: B saves status → A sees status update without reload within ~500 ms; A's other unsaved fields are preserved.
- B saves while Hocuspocus is down: A does not get the live update; A's reads/writes still work; A sees offline indicator.
- Cross-tenant probe: a user from tenant X with valid token-for-X cannot join `ticket:Y:*` (verified via direct WS probe in test).
**Phase 2 — Presence + per-field editing indicator:**
- Two users open the same ticket: each sees the other in the presence bar within ~2 s.
- A focuses status; B sees "Alex is editing" indicator on the status field. A blurs; indicator clears on B's screen.
- A and B both focused on status simultaneously: each sees the other's indicator (no hard lock).
- One user open in two tabs: presence shows once, not twice.
**Phase 3 — Conflict banner:**
- A has unsaved status change pending. B saves a different status. A sees the banner with B's value + author + timestamp; the field is frozen until A clicks Keep yours or Take theirs.
- A has unsaved change on field X; B saves field Y. A sees a passing toast; A's pending change on X is preserved; Y is silently updated.
- Banner Keep yours: A's local pending value remains; A's next save sends it; banner clears.
- Banner Take theirs: A's local pending is dropped; field reflects B's value; banner clears.
**Always (regression guards):**
- Hocuspocus container killed: ticket page renders, all reads/writes via REST work, optimistic updates and unsaved-changes warning all behave as today; offline indicator shown.
- Hocuspocus restarts: client reconnects within ~30 s, indicator disappears, refetches once on reconnect.
- All existing ticket-related tests pass unchanged.