Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
4.1 KiB
DB transaction guardrails and after-commit work
Rules and safety nets introduced by the SLA close/reopen deadlock fix
(.ai/sla_close_deadlock_proper_fix_plan.md has the full investigation).
Rules for transactional code
- One DB writer per ticket row per logical operation. SLA column
mutations happen exactly once, in the caller's transaction. The SLA
"backend" (
ISlaBackend) schedules external side effects only — it never re-does a DB write. The CEPgBossSlaBackendmutation hooks are no-ops. - No network or cross-connection work inside an open transaction. Event
publishing and backend scheduling run after commit:
registerAfterCommit(trx, hook, label?)(@alga-psa/db) queues work that the transaction-owningwithTransactionframe flushes after a successful commit, in registration order. Hooks are dropped on rollback. NestedwithTransactionframes share the owner'strx, so their hooks flush once, at the outer commit. Pass alabel(e.g."TICKET_CLOSED ticket=<id>") so a failed hook is traceable in logs. Hook failures are logged and swallowed: events are at-most-once — a publish that fails after commit is lost (no outbox), the committed write stands.- SLA write functions return
backendActions; callers dispatch them withdispatchSlaBackendActions()(@alga-psa/sla) after their transaction resolves.
- SLA writes are serialized per ticket. Every SLA write entry point
takes
pg_advisory_xact_lock(hashtext('sla:<tenant>:<ticket>'))first (acquireTicketSlaLock). Transaction-scoped, so it is safe under pgbouncer transaction pooling and self-releases at commit/rollback.
Event bus poison resistance
- Handler success is tracked per
(event, handler)(Redis setprocessed_event_handlers:<tenant>), so one failing handler's redelivery never re-runs co-subscribers that already succeeded (e.g. outbound webhooks on the shared default-channel streams). Subscribers that share a stream with same-named handler functions must pass a distinctsubscriberIdtosubscribe(). - Messages delivered more than
eventBus.maxDeliveriestimes (default 10, envREDIS_STREAM_MAX_DELIVERIES) are moved to<stream>:dead-letterand acked. Dead-letter entries keep the original payload plussourceStream/sourceMessageId/deliveries/deadLetteredAtfor inspection and replay. The write is idempotent (marker setdead_lettered_messages:<stream>, 3-day TTL), so an xAdd-succeeded / xAck-failed retry does not duplicate the entry. Monitor dead-letter volume. - A handler that throws gets a bounded retry (redelivery up to the cap), not an infinite storm.
Postgres timeouts (defense in depth)
Migration 20260609120000_set_app_role_db_guardrail_timeouts.cjs sets on
the app role (DB_USER_SERVER, default app_user):
idle_in_transaction_session_timeout = 60s— a session idle mid-transaction is aborted and releases its locks. This fires on a single continuous 60s idle gap between statements, not on total transaction duration; steady statement loops are unaffected. 60s (not lower) leaves headroom for a slow external call awaited between statements — waiters are already protected by lock_timeout regardless of how long the holder sits.lock_timeout = 8s— statements fail fast instead of queueing behind a stuck lock holder.
These are role-level GUCs (not pool afterCreate SETs) because pgbouncer
runs pool_mode = transaction: session-level SETs issued at connection
creation do not reliably follow a client across backend remapping, while
role GUCs resolve server-side at backend session start. The admin/migration
role is deliberately excluded so long-running DDL stays legal.
pgbouncer/pgbouncer.ini.template keeps idle_transaction_timeout = 120
as a last-resort reaper for whatever the role GUCs don't cover. It must
stay above the role GUC so the gentler server-side abort fires before
pgbouncer kills the connection.
Verify on a deployment:
-- as the app user, through pgbouncer
SHOW idle_in_transaction_session_timeout; -- 60s
SHOW lock_timeout; -- 8s