Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

47 KiB
Raw Blame History

Scratchpad — API Rate Limiting and Outbound Ticket Webhooks

  • Plan slug: api-rate-limiting-and-ticket-webhooks
  • Created: 2026-05-05
  • Source plans (kept for diff/history; this folder is canonical going forward):
    • /Users/natalliabukhtsik/Desktop/projects/alga-psa/.ai/api-rate-limiting-plan.md
    • /Users/natalliabukhtsik/Desktop/projects/alga-psa/.ai/ticket-webhooks-plan.md

What This Is

Rolling notes for the combined effort. Append decisions and discoveries as implementation progresses; update earlier entries when something changes.

Decisions

  • (2026-05-05) Combined into one plan. The two source plans share infrastructure (TokenBucketRateLimiter namespace work — features F001F005 — must land before either feature can use namespaced buckets). Splitting the features into one plan avoids re-stating the foundation.
  • (2026-05-05) Queue: Redis ZSET, not BullMQ or Temporal. BullMQ is not a current dependency; adding it would introduce a third queue paradigm. Temporal is in use for workflow-worker but webhook delivery is "POST + retry," not multi-step. The DelayedEmailQueue ZSET pattern (packages/email/src/DelayedEmailQueue.ts) is the closest analog and reuses the existing Redis client. User confirmed this on 2026-05-05.
  • (2026-05-05) Signing secret stored via secret provider, not hashed. HMAC requires the plaintext on every delivery — hashing breaks signing. Mirror the Stripe integration (webhook_secret_vault_path column, resolved through getSecretProviderInstance()). Fixed during plan review.
  • (2026-05-05) Reuse TooManyRequestsError, don't add a parallel RateLimitError. It already exists at apiMiddleware.ts:101-111 with the right shape. Plumb headers through ApiError.headers instead.
  • (2026-05-05) Subscribe to TICKET_STATUS_CHANGED directly. It's a first-class internal event (eventBusSchema.ts:170) — don't synthesize it from TICKET_UPDATED.changes.status_id.
  • (2026-05-05) Three auth surfaces, one helper. enforceApiRateLimit(req, ctx) is called from ApiBaseController.authenticate, withApiKeyAuth (both branches), and withAuth. NM Store path uses sentinel subjectId 'nm_store' since it has no apiKeyId.
  • (2026-05-05) Defer to v2 by removing routes, not by leaving 501s. Discovered 14+ TODO stubs in ApiWebhookController. The deferred ones (transformations, bulk ops, templates marketplace, etc.) get their route files deleted so OpenAPI doesn't advertise them.
  • (2026-05-05) Rate-limiter and webhooks share the TokenBucketRateLimiter namespace work. The webhook per-webhook outbound cap (namespace 'webhook-out') depends on F001F005 being merged first.
  • (2026-05-06) Place the v1 webhook admin UI under Security, next to API Keys. The open question remained unresolved, and the existing /msp/security-settings surface already hosts the external API admin controls. Reusing that location avoids inventing a second admin-only settings entry point during the MVP.

Discoveries / Constraints

  • (2026-05-05) TokenBucketRateLimiter is at packages/email/src/TokenBucketRateLimiter.ts. Bucket key prefix is alga-psa:ratelimit:bucket: and TTL is 3600s. The BucketConfigGetter signature is (tenantId) => BucketConfig — must widen to (tenantId, subjectId?) => BucketConfig for per-key/per-webhook overrides.
  • (2026-05-05) Existing email rate-limit defaults are maxTokens=60, refillRate=1. New API defaults are deliberately higher (120, 1) — API bursts are expected to be larger than email bursts.
  • (2026-05-05) WebhookService.checkRateLimit (line 1056) queries webhook_deliveries, which doesn't exist yet — it would throw if called. Latent bug: nothing currently calls into the delivery path.
  • (2026-05-05) WebhookService.performWebhookDelivery (line 950) is mocked — sleeps 100 ms and returns { success: true, status_code: 200 }. No real HTTP request happens today.
  • (2026-05-05) webhookEventTypeSchema lacks ticket.comment.added. F023 must extend the enum or webhook creation requests for that event type fail validation.
  • (2026-05-05) Existing distribution pattern for tenant-scoped tables: notification_settings is in 20250805000019_distribute_final_tables.cjs. Migration extension is .cjs, not .ts. Citus distribution lives in ee/server/migrations/citus/, separate from the create migration in server/migrations/.
  • (2026-05-05) PostgreSQL UNIQUE (tenant, api_key_id) would allow multiple (tenant, NULL) tenant-default rows. The migration needs a separate unique partial index on tenant WHERE api_key_id IS NULL to make the null fallback row actually unique.
  • (2026-05-05) The current secret-provider API resolves tenant secrets by (tenant, secretName), not by an arbitrary vault path. For webhook signing secrets, signing_secret_vault_path therefore acts as stored metadata; the DAL resolves the actual secret by taking the basename of the stored path and calling getTenantSecret(tenant, basename(path)).
  • (2026-05-05) undici is already available in the server runtime, so the real webhook transport can use undici.fetch + Agent for the verify_ssl=false path without introducing a new dependency.
  • (2026-05-05) Node's net.BlockList is sufficient for the required SSRF address classes. The helper now blocks RFC1918, loopback, link-local, and CGNAT IPv4 ranges plus ::1 and fe80::/10, and it short-circuits all of those checks when WEBHOOK_SSRF_ALLOW_PRIVATE=true.
  • (2026-05-05) The repo still had an older generic webhook validator that expected sha256=<hex>. F030 replaces that with the PRD-specific outbound format t=<unix>,v1=<hex> and routes the leftover schema helper through the new shared implementation so future controller work doesn't split the signature recipe again.
  • (2026-05-05) The ticket webhook surface now has a single canonical translation layer under eventBus/subscribers/webhook/; future subscriber fan-out code can map one internal event to one or more public webhook events without duplicating string switches.
  • (2026-05-05) The placeholder retry math in WebhookService was still using generic exponential/linear config fields. F039 replaces that with the PRD's fixed retry cadence and exposes it as a shared helper for the future Redis queue worker.
  • (2026-05-05) initializeApp.ts is a poor tsx smoke-import target in this repo because importing the full app graph pulls Next/UI assets like react-day-picker/src/style.css. For F031 validation, focused imports of the new rate-limit getter and the touched service file are the useful checks.
  • (2026-05-05) ApiBaseController.authenticate is not the universal hook point — withApiKeyAuth and withAuth in apiMiddleware.ts:144,201 are independent paths, and the NM Store branch in withApiKeyAuth produces a context with apiKeyId === undefined. Verified by reading service-types and test-auth routes.
  • (2026-05-05) /api/v1/test-auth does not use the same withApiKeyAuth helper as service-types; it goes through the older server/src/lib/api/middleware/apiAuthMiddleware.ts. Rate-limit wiring has to cover that legacy wrapper too or the planned cross-surface test would split buckets by middleware implementation.
  • (2026-05-05) Several /api/v1 route families still bypassed the three shared auth surfaces even after F018: asset routes and contract-line routes were calling controllers that expect req.context but never authenticated, and a handful of direct route handlers (tickets/priorities, tickets/statuses, ticket comment reactions, storage routes, and several mobile moderation/push/account routes) were validating API keys inline without invoking the limiter.
  • (2026-05-05) Internal event vocabulary is much larger than the v1 public surface. TICKET_REOPENED, TICKET_ESCALATED, TICKET_PRIORITY_CHANGED, TICKET_UNASSIGNED, TICKET_QUEUE_CHANGED, TICKET_TAGS_CHANGED, TICKET_RESPONSE_STATE_CHANGED, TICKET_ADDITIONAL_AGENT_ASSIGNED exist in EVENT_TYPES but are deferred to v2 (rolled into ticket.updated).
  • (2026-05-05) TICKET_COMMENT_ADDED currently reaches subscribers through the legacy TicketEventPayloadSchema shape from TicketService: it includes payload.comment.{content,author,isInternal} but not a persisted comment timestamp. The webhook payload builder therefore uses payload.occurredAt / event timestamp for comment.timestamp.
  • (2026-05-05) TICKET_STATUS_CHANGED payloads may arrive in either the new domain shape (previousStatusId) or an older changes.status_id.from style. The webhook payload builder now accepts both so subscriber output stays stable across publishers while the event vocabulary converges.
  • (2026-05-05) webhookSubscriber.ts needs a queue boundary before the full poller lands. I added WebhookDeliveryQueue.enqueue() as the initial Redis storage contract now, and the later F037 work will extend that same class with claim/process/retry behavior instead of swapping subscriber behavior.
  • (2026-05-05) Importing server/src/lib/eventBus/subscribers/index.ts through tsx drags a large app/UI graph and currently trips the same unrelated react-day-picker/src/style.css loader issue seen with broad initializeApp smoke imports. The narrower webhookSubscriber.ts module import remains the useful compile smoke for webhook subscriber changes.
  • (2026-05-05) ApiWebhookController.ts imports can hit that same broad .css loader issue under tsx. For controller TODO replacements, the narrower DAL/helper module smokes plus git diff --check are the reliable local validation path unless we run the full server test suite.
  • (2026-05-05) The webhook signature-verify route now supports both the plan's direct secret_vault_path input and a safer webhook_id lookup. Both paths resolve to the same tenant secret provider and use the shared verifyWebhookSignature() helper after normalizing the header format.
  • (2026-05-05) The remaining read-side webhook controller stubs can stay thin: delivery details come straight from webhook_deliveries, health derives from the webhook stats columns already maintained by the delivery processor, subscriptions are just webhook.event_types, and available events come from webhookEventTypeSchema.options.
  • (2026-05-05) Deferred webhook TODOs are now route-level cleanup, not controller cleanup. The implemented surface keeps nested delivery/health/ subscriptions reads plus create/list/test/verify, and drops the transform, filter, validate, bulk, search, export, trigger, and system-health routes so they naturally 404 instead of advertising dead handlers.
  • (2026-05-05) The nested webhook test route now diverges from the older generic /api/v1/webhooks/test helper: /[id]/test always uses the stored webhook URL + live signing secret, emits event_type='webhook.test', records is_test=true, and intentionally skips outbound bucket consumption.
  • (2026-05-05) Broad imports through server/src/lib/jobs/index.ts also hit the same unrelated react-day-picker CSS loader issue under tsx, so the cleanup-job service module is the reliable smoke target for scheduled-job additions in this environment.
  • (2026-05-05) I could not find a dedicated operational metrics client/facade in this repo. For the v1 observability items, the fallback is structured log emission with stable metric names/labels rather than a Prometheus/StatsD sink.
  • (2026-05-05) Webhook observability follows that same fallback pattern: queue depth is emitted from the Redis ZSET wrapper, delivery totals and durations from the delivery processor, and auto-disable counts from the state transition helper.
  • (2026-05-05) Public docs for this plan now live at docs/api/api-rate-limiting-and-ticket-webhooks.md and are linked from docs/api/api_overview.md.
  • (2026-05-05) WebhookDeliveryQueue now owns the retry loop contract: processors now return explicit delivered / retry / abandoned outcomes. The queue handles atomic zRem claims, caps active work at 50 in-process jobs, and re-enqueues attempts 2..5 with computeBackoff(attempt).
  • (2026-05-05) Auto-disable must follow a continuous failure streak, not just "some failures in the last day." maybeAutoDisable() therefore keys off the first non-delivered attempt since last_success_at and disables only once that streak has remained all-failure for 24 hours.
  • (2026-05-05) Added feature F052 after discovering a plan/code mismatch: webhookSchemas.ts already exposed event_filter.entity_ids, but the webhooks table migration and webhookModel never persisted event_filter at all. The subscriber-side entity filter needs that durable field first.
  • (2026-05-05) The v1 subscriber filter stops at event_filter.entity_ids. Generic conditions, tags, and entity_types remain schema-only for now per the PRD; the enqueue path simply treats an empty/missing entity_ids list as "match all."
  • (2026-05-06) The new webhook settings tab uses tenant-authenticated server actions instead of the standalone /api/v1/webhooks controller surface. That keeps the admin UI on the same auth model as the rest of settings while still reusing the shared DAL, delivery transport, signing helper, and queue.
  • (2026-05-06) tsx import-smoke of the new client component still trips the repo's unrelated react-day-picker/src/style.css loader issue. The focused validation path for F047 is therefore git diff --check plus a direct smoke import of the new server-action module, matching the earlier UI validation limitation already documented for this repo.
  • (2026-05-06) The first API rate-limit integration harness now uses a minimal ApiBaseController subclass plus mocked auth/RBAC/data-service edges. That keeps T007 focused on the shared authenticate/throttle/response path without having to pull the full tickets stack or a database-backed route into the fixture.
  • (2026-05-06) T016 exercises the per-key override path by spying on apiRateLimitSettingsReadOps.getForKey and wiring the bucket to the real apiRateLimitConfigGetter. Both the limit-header lookup and the bucket's internal lookup share the same in-process cache, so a single seeded row drives both consumption (tryConsume) and the X-RateLimit-Limit value emitted on every response — no additional fixture is required.
  • (2026-05-06) T017 covers the rate-limit server-action contract at the cache + DAL seam rather than through the withAuth wrapper. The session machinery used by setApiRateLimitForKey / clearApiRateLimitForKey (getCurrentUser, getUserRoles, assertApiKeyExists) is session-coupled and out of scope for vitest in this repo; the load-bearing assertion ("subsequent enforce call sees new limit immediately, not after 30s") lives in the invalidateApiRateLimitConfig step the actions perform after each upsert/clear, so the test simulates that exact write+invalidate sequence and verifies the bucket honours the new limit on the very next tryConsume.
  • (2026-05-06) Reusable webhook delivery test fixture: in-memory mock Redis implementing RedisClientLike with full ZSET semantics (zAdd/zRem/ zRangeByScore/zCard) plus an ephemeral node:http stub server keyed off WEBHOOK_SSRF_ALLOW_PRIVATE=true. The webhook model + autoDisable are mocked at the module boundary and the queue is given a 999_999 ms checkIntervalMs so the setInterval poller never races a manual queue.process() call. (WebhookDeliveryQueue as any).instance = null is required between tests to reset the singleton; the public API has no resetInstance.
  • (2026-05-06) T026 exercises tenant isolation at the subscriber/event-bus seam without standing up a real Redis-backed event bus: mocking @/lib/eventBus to a stub that records subscribe() callbacks lets the test invoke the captured handler directly with a forged TICKET_ASSIGNED event. webhookModel.listForEventType then proves the query is scoped to the publishing tenant and WebhookDeliveryQueue.enqueue is spied to verify only the matching-tenant webhook gets a job.
  • (2026-05-06) T027 skips vi.useFakeTimers in favour of fast-forwarding the mock-Redis ZSET scores between iterations (fastForwardAll()); the queue's claim/process cycle is what matters and it's already deterministic once checkIntervalMs is set to 999_999. A small waitFor polling helper drains in-flight deliveries between attempts. This keeps the retry-cadence assertion ("score equals now + computeBackoff(attempt)") honest without the cross-test contamination fake timers tend to introduce.
  • (2026-05-06) T030 simulates two pods racing on the same job by initializing two WebhookDeliveryQueue instances against the same shared mock Redis (clear (WebhookDeliveryQueue as any).instance = null between the two getInstance() calls). A custom processor spy passed to initialize() lets the test assert "exactly one of the two workers ran the processor" without spinning up the full delivery stack.
  • (2026-05-06) T031 mocks undici at the package boundary so assertSafeWebhookTarget can be exercised end-to-end through performWebhookDeliveryRequest. The blocked path proves the SSRF guard fires before fetch is reached (spy unused) and returns error_type='ssrf'; the bypassed path proves the override path lets the fetch through. The Agent constructor is mocked alongside fetch so verify_ssl=false paths don't touch the real undici Agent.
  • (2026-05-06) T036 cannot use vi.spyOn(ApiBaseController.prototype, ...) because ApiWebhookController declares its OWN private authenticate and checkPermission that shadow the base class — the spy must be on ApiWebhookController.prototype directly. URLs in controller tests must use a real UUID for the [id] segment because extractIdFromPath validates against ^[0-9a-f]{8}-...$.
  • (2026-05-06) T037 audits the migration source files instead of spinning up a Citus-aware test database, since the vitest harness here doesn't have Citus available. The audit verifies the table-creation + partial unique-index + distribute_table contracts that real migrations enforce; if/when a Citus test DB lands, this test should be replaced with a real migrate:up smoke + a pg_dist_partition query.

Commands / Runbooks

  • (2026-05-05) Run a single integration test: cd server && npx vitest run src/test/integration/apiRateLimit.headers.test.ts
  • (2026-05-05) Run all webhook integration tests: cd server && npx vitest run src/test/integration/webhook*
  • (2026-05-05) Run unit tests for the rate limiter package: cd packages/email && npx vitest run src/__tests__/TokenBucketRateLimiter*
  • (2026-05-05) Apply migrations against a local dev database — see existing migrate flow in server/package.json (knex CLI driven by migrations/ and ee/server/migrations/citus/).
  • (2026-05-05) Toggle observation mode locally: RATE_LIMIT_ENFORCE=false in server/.env. Toggle SSRF bypass for staging: WEBHOOK_SSRF_ALLOW_PRIVATE=true.
  • (2026-05-05) Tail Redis bucket state during integration tests: redis-cli --scan --pattern 'alga-psa:ratelimit:bucket:*' | xargs -L1 redis-cli get
  • (2026-05-05) Run the namespace foundation unit suite without coverage noise: cd server && npx vitest run --coverage.enabled=false src/test/unit/notifications/tokenBucketRateLimiter.test.ts ../packages/email/src/__tests__/TokenBucketRateLimiter.namespaces.test.ts ../packages/email/src/__tests__/TokenBucketRateLimiter.subjectId.test.ts ../packages/email/src/__tests__/TokenBucketRateLimiter.email-regression.test.ts
  • (2026-05-05) Run the API response-header unit test: cd server && npx vitest run --coverage.enabled=false src/test/unit/api/apiMiddleware.responseHeaders.test.ts
  • (2026-05-05) Run the API rate-limit config getter unit tests: cd server && npx vitest run --coverage.enabled=false src/lib/api/rateLimit/__tests__/configGetter.cache.test.ts src/lib/api/rateLimit/__tests__/configGetter.invalidate.test.ts src/lib/api/rateLimit/__tests__/configGetter.fallback.test.ts
  • (2026-05-05) Run the API rate-limit enforcement helper tests: cd server && npx vitest run --coverage.enabled=false src/lib/api/rateLimit/__tests__/enforce.test.ts src/test/unit/api/apiMiddleware.responseHeaders.test.ts
  • (2026-05-05) Smoke-load the webhook payload builder: cd server && npx tsx -e "import('./src/lib/eventBus/subscribers/webhook/webhookTicketPayload.ts').then(() => console.log('payload-ok'))"
  • (2026-05-05) Smoke-load the webhook subscriber + queue storage layer: cd server && npx tsx -e "import('./src/lib/webhooks/processWebhookDeliveryJob.ts').then(() => console.log('processor-ok'))" cd server && npx tsx -e "import('./src/lib/webhooks/autoDisable.ts').then(() => console.log('auto-disable-ok'))" cd server && npx tsx -e "import('./src/lib/webhooks/WebhookDeliveryQueue.ts').then(() => console.log('queue-ok'))" cd server && npx tsx -e "import('./src/lib/eventBus/subscribers/webhookSubscriber.ts').then(() => console.log('subscriber-ok'))"
  • (2026-05-05) cd server && npx tsc --noEmit --pretty false currently OOMs in this repo, and even targeted tsc entrypoint checks surface existing package-resolution / JSX-config errors unrelated to this feature slice, so compile verification here is limited to focused runtime/unit checks plus manual review.
  • (2026-05-05) Smoke-import the webhook DAL after edits: cd server && npx tsx -e "import('./src/lib/webhooks/webhookModel.ts').then(() => console.log('ok'))"
  • (2026-05-05) Smoke-import the webhook delivery transport after edits: cd server && npx tsx -e "import('./src/lib/webhooks/delivery.ts').then(() => console.log('delivery-ok'))"
  • (2026-05-06) Smoke-import the webhook admin server actions after edits: npx tsx -e "import('./packages/auth/src/actions/webhookActions.ts').then(() => console.log('webhook-actions-ok'))"
  • (2026-05-05) Quick SSRF helper smoke: cd server && npx tsx -e "import('./src/lib/webhooks/ssrf.ts').then(async ({ assertSafeWebhookTarget }) => { await assertSafeWebhookTarget('https://example.com'); console.log('public-ok'); try { await assertSafeWebhookTarget('http://127.0.0.1'); process.exit(1); } catch (error) { console.log((error && error.name) || 'error'); } })"
  • (2026-05-05) Quick signing helper smoke: cd server && npx tsx -e "import('./src/lib/webhooks/sign.ts').then(({ signRequest, verifyWebhookSignature }) => { const header = signRequest('shh', '{\\\"a\\\":1}', 1700000000); console.log(header); console.log(verifyWebhookSignature(header, '{\\\"a\\\":1}', 'shh')); })"
  • (2026-05-05) Quick event-map smoke: cd server && npx tsx -e "import('./src/lib/eventBus/subscribers/webhook/webhookEventMap.ts').then(({ publicEventsFor }) => { console.log(publicEventsFor('TICKET_ASSIGNED').join(',')); console.log(publicEventsFor('NOPE').length); })"
  • (2026-05-05) Quick backoff helper smoke: cd server && npx tsx -e "import('./src/lib/webhooks/backoff.ts').then(({ computeBackoff }) => { console.log([1,2,3,4,5].map(computeBackoff).join(',')); })"
  • (2026-05-05) Quick webhook rate-limit getter smoke: cd server && npx tsx -e "import('./src/lib/webhooks/rateLimitConfig.ts').then(({ DEFAULT_WEBHOOK_RATE_LIMIT_PER_MIN }) => console.log(DEFAULT_WEBHOOK_RATE_LIMIT_PER_MIN))"
  • Source plans:
    • .ai/api-rate-limiting-plan.md
    • .ai/ticket-webhooks-plan.md
  • Key files:
    • packages/email/src/TokenBucketRateLimiter.ts — bucket implementation.
    • packages/email/src/DelayedEmailQueue.ts — pattern for WebhookDeliveryQueue.
    • server/src/lib/initializeApp.ts:144-168 — singleton init site.
    • server/src/lib/api/controllers/ApiBaseController.ts:44-87 — auth surface 1.
    • server/src/lib/api/middleware/apiMiddleware.ts:101-111TooManyRequestsError; lines 144 & 201 — auth surfaces 2 & 3.
    • server/src/lib/api/services/WebhookService.ts:950, 1056 — mock + broken rate limit.
    • server/src/lib/api/controllers/ApiWebhookController.ts — 14+ TODOs.
    • packages/event-schemas/src/schemas/eventBusSchema.ts:157-184 — internal EVENT_TYPES.
    • server/src/lib/api/schemas/webhookSchemas.ts:21-60 — public enum to extend.
    • ee/server/migrations/20251014120000_create_stripe_integration_tables.cjs:28webhook_secret_vault_path precedent.
    • server/src/lib/webhooks/webhookModel.ts — tenant-scoped webhook DAL and signing-secret resolution helpers.
    • server/src/lib/webhooks/delivery.ts — shared outbound HTTP transport for webhook delivery with timeout/TLS/error classification.
    • server/src/lib/webhooks/ssrf.ts — outbound target validation for webhook delivery and test-send flows.
    • server/src/lib/webhooks/sign.ts — outbound request signing and signature verification helper for webhook deliveries.
    • server/src/lib/eventBus/subscribers/webhook/webhookEventMap.ts — canonical mapping from internal ticket events to public webhook events.
    • server/src/lib/webhooks/backoff.ts — shared retry schedule helper for the outbound webhook queue.
    • server/src/lib/webhooks/rateLimitConfig.ts — shared token-bucket config getter for the webhook-out namespace.

Open Questions

  • (2026-05-05) IA placement of the new admin UIs — Settings → Security or Settings → Integrations? Confirm with design before F022/F047 lands.
  • (2026-05-05) Per-tenant cap on top of per-key buckets? Defer until Stage 1 observation data justifies it.
  • (2026-05-05) Per-endpoint cost weights (search costs more than get)? Defer until observation data shows pressure differences.
  • (2026-05-05) Expose ticket.deleted in v1? Decision: defer unless the noisy poller specifically asks during migration.
  • (2026-05-05) Per-tenant webhook count cap — proposed 50; confirm before F047 lands.

Progress Log

  • (2026-05-05) F001 complete. TokenBucketRateLimiter now requires an explicit namespace on tryConsume, getState, getBucketKey, and getBucketConfig. Redis keys now include the namespace segment (alga-psa:ratelimit:bucket:{namespace}:{tenant}[:{subject}]) so future API/webhook buckets cannot collide with the existing email path.
  • (2026-05-05) F002 complete. BucketConfigGetter now receives (tenantId, subjectId?), which lets the limiter surface per-key and per-webhook configuration decisions without additional key parsing.
  • (2026-05-05) F003 complete. TokenBucketRateLimiter.initialize() now accepts a namespace-to-getter map, and lookup/fail-open behavior stays centralized inside the shared limiter instead of spreading per-namespace branching to callers.
  • (2026-05-05) F004 complete. initializeApp() now registers the existing email tenant-config getter under namespace email and a temporary hard-coded API getter under namespace api, so startup is already wired for the upcoming API limiter without altering email defaults.
  • (2026-05-05) F005 complete. TenantEmailService.checkRateLimits() now consumes tokens from namespace email, preserving the pre-existing per-tenant/per-user email semantics after the limiter API change.
  • (2026-05-05) T001 complete. Added packages/email/src/__tests__/TokenBucketRateLimiter.namespaces.test.ts to prove the same tenant/subject can exhaust email without consuming the api bucket.
  • (2026-05-05) T002 complete. Added packages/email/src/__tests__/TokenBucketRateLimiter.subjectId.test.ts to verify namespace getters receive subjectId and that API-key buckets are keyed as ...:api:{tenant}:{subject}.
  • (2026-05-05) T003 complete. Added packages/email/src/__tests__/TokenBucketRateLimiter.email-regression.test.ts with fake time pinned to confirm the email namespace preserves the legacy 60-token burst / 1-token-per-second refill behavior at calls 1, 30, 60, and 61.
  • (2026-05-05) F006 complete. ApiError now supports optional response headers and handleApiError() forwards them into NextResponse.json(), which lets later rate-limit errors attach Retry-After and X-RateLimit-* metadata without a parallel error class.
  • (2026-05-05) F007 complete. createSuccessResponse() and createPaginatedResponse() now accept optional extraHeaders as a final parameter, preserving existing controller call sites while opening a clean path for rate-limit headers on successful responses.
  • (2026-05-05) F008 complete. Added server/migrations/20260505123000_create_api_rate_limit_settings.cjs with tenant-scoped rate-limit columns plus separate unique indexes for per-key rows and the (tenant, NULL) tenant default row.
  • (2026-05-05) F009 complete. Added ee/server/migrations/citus/20260505123100_distribute_api_rate_limit_settings.cjs so the new settings table is distributed on tenant when Citus is present.
  • (2026-05-05) F010 complete. Added server/src/lib/api/rateLimit/apiRateLimitSettingsModel.ts with exact-row reads/writes plus a fallback resolver that checks (tenant, apiKeyId), then (tenant, NULL), then the hard defaults { maxTokens: 120, refillRate: 1 }.
  • (2026-05-05) F011 complete. Added server/src/lib/api/rateLimit/apiRateLimitConfigGetter.ts with a 1000-entry, 30-second TTL cache, exact-entry invalidation, tenant-prefix invalidation, and initializeApp() now uses it for the api namespace.
  • (2026-05-05) T004 complete. Added server/src/lib/api/rateLimit/__tests__/configGetter.cache.test.ts to verify identical cached lookups hit the settings resolver once.
  • (2026-05-05) T005 complete. Added server/src/lib/api/rateLimit/__tests__/configGetter.invalidate.test.ts to prove tenant-wide invalidation clears only that tenant and single-key invalidation clears only the targeted key.
  • (2026-05-05) T006 complete. Added server/src/lib/api/rateLimit/__tests__/configGetter.fallback.test.ts to verify the resolver order is per-key override, then tenant default, then the hard-coded API defaults.
  • (2026-05-05) F012 complete. Added server/src/lib/api/rateLimit/enforce.ts as the shared API limiter entry point. It resolves the api namespace bucket, skips configured bypass paths, computes rate-limit header values, and either throws TooManyRequestsError or returns a RateLimitDecision.
  • (2026-05-05) F013 complete. enforceApiRateLimit() now treats RATE_LIMIT_ENFORCE=false as observation mode: it logs the throttle with tenant/api-key/retry metadata and returns a decision instead of throwing.
  • (2026-05-05) F014 complete. The NM Store branch in apiMiddleware.withApiKeyAuth() now stamps rateLimitSubjectId='nm_store' before calling the limiter so all global-key traffic shares one tenant bucket instead of bypassing per-subject accounting.
  • (2026-05-05) F015 complete. shouldBypassRateLimit() now centralizes the bypass prefixes for health endpoints, mobile auth, and runner-internal endpoints so future auth wrappers reuse one rate-limit allowlist.
  • (2026-05-05) F016 complete. Rate-limit denials now throw the existing TooManyRequestsError with details.retry_after_ms, details.remaining, and the full header set attached on error.headers.
  • (2026-05-05) F017 complete. ApiBaseController.authenticate() now enforces the API bucket immediately after building request context and stores the resulting decision on apiRequest.context.rateLimit.
  • (2026-05-05) F018 complete. The middleware auth wrappers now call enforceApiRateLimit() as soon as context is available. I also wired the legacy apiAuthMiddleware.ts path so /api/v1/test-auth stays in the same bucket family as the newer wrappers.
  • (2026-05-05) F019 complete. createSuccessResponse() and createPaginatedResponse() now emit X-RateLimit-Limit and X-RateLimit-Remaining automatically when the passed request carries context.rateLimit, and the generic ApiBaseController create/update paths now pass apiRequest through to the helper.
  • (2026-05-05) F020 complete. Added reusable legacy auth helpers: authenticateApiKeyRequest() for inline API-key handlers, withApiKeyRouteAuth() for route files that need req.context, and appendRateLimitHeaders() for direct NextResponse routes. Wrapped the entire asset and contract-line /api/v1 route families so they now authenticate through the shared legacy middleware and emit rate-limit headers. I also migrated the remaining direct /api/v1 handlers that were doing inline API-key validation (ticket priorities/statuses/reactions, storage routes, and the non-mobile-auth mobile moderation/push/account routes) onto the shared helper so they consume the same api bucket.
  • (2026-05-05) F021 complete. Added tenant-admin server actions in packages/auth/src/actions/apiKeyRateLimitActions.ts: getApiRateLimitForKey, setApiRateLimitForKey, setTenantDefaultApiRateLimit, and clearApiRateLimitForKey. They verify admin access, scope API key IDs to the current tenant, use the api_rate_limit_settings model for reads/writes, and invalidate the in-process API rate-limit config cache immediately after every write so UI updates do not wait on the 30s TTL.
  • (2026-05-05) F022 complete. AdminApiKeysSetup now loads each key's effective API rate-limit settings plus live bucket state and renders a new "Rate Limit" column with inline override editing and reset. The column shows the effective burst / refill values, the config source (per-key override vs tenant default vs hard default), and the current remaining tokens from TokenBucketRateLimiter.getState('api', tenant, apiKeyId).
  • (2026-05-05) F023 complete. The public webhook event enum now includes ticket.comment.added, so webhook create/update validation no longer rejects the v1 ticket-comment subscription event.
  • (2026-05-05) T018 complete. Added server/src/lib/api/schemas/__tests__/webhookSchemas.test.ts to lock in acceptance of the new ticket.comment.added enum member.
  • (2026-05-05) F024 complete. Added server/migrations/20260505140000_create_webhook_tables.cjs with the base webhooks subscription table: tenant-scoped primary key, event list, signing-secret vault path, retry/rate-limit config, activation flag, rolling delivery stats, auto-disable timestamp, and creator/audit timestamps.
  • (2026-05-05) F025 complete. Expanded the same webhook migration to add webhook_deliveries with tenant/webhook foreign key wiring, request + response capture columns, retry scheduling fields, is_test, and the three queue-oriented indexes required by the PRD (webhook+attempted_at, event_id, and partial pending/retrying next_retry_at).
  • (2026-05-05) F026 complete. Added ee/server/migrations/citus/20260505140100_distribute_webhook_tables.cjs to distribute both webhooks and webhook_deliveries on tenant, with the same Citus-enabled / already-distributed guards used by the earlier rate-limit distribution migration.
  • (2026-05-05) F027 complete. Added server/src/lib/webhooks/webhookModel.ts as the first non-mock webhook foundation: public reads omit signing_secret_vault_path, inserts persist signing secrets via getSecretProviderInstance(), delivery attempts write to webhook_deliveries, stats updates increment the rolling counters on webhooks, and getSigningSecret() resolves the stored path-style reference back to the tenant secret name.
  • (2026-05-05) F028 complete. Added server/src/lib/webhooks/delivery.ts and rewired WebhookService.performWebhookDelivery() to use it. Deliveries now perform a real undici.fetch call with a 10s timeout, preserve response status and headers, truncate stored response bodies to 8 KB, classify DNS/connect/TLS/ timeout failures, and disable certificate verification only when verify_ssl=false.
  • (2026-05-05) F029 complete. Added server/src/lib/webhooks/ssrf.ts and enforced it in the shared delivery transport before any outbound fetch. Targets must now use http(s), reject localhost/loopback/private/link-local/CGNAT destinations after DNS resolution, and only bypass those checks when WEBHOOK_SSRF_ALLOW_PRIVATE=true.
  • (2026-05-05) F030 complete. Added server/src/lib/webhooks/sign.ts with the PRD's X-Alga-Signature contract: t=<timestamp>,v1=<sha256 hex> over ${timestamp}.${body}. webhookSchemas.validateWebhookSignature() now delegates to the same helper instead of preserving the old sha256=<hex> comparison logic.
  • (2026-05-05) F032 complete. Added server/src/lib/eventBus/subscribers/webhook/webhookEventMap.ts with the v1 ticket-event translation table and a publicEventsFor() helper that returns a fresh array for each lookup, making the mapping ready for the upcoming event-bus subscriber.
  • (2026-05-05) F039 complete. Added server/src/lib/webhooks/backoff.ts with the PRD retry schedule (1m, 5m, 30m, 2h, 12h) and pointed the scaffolded WebhookService.calculateNextRetryTime() method at that helper so old placeholder retry math no longer diverges from the intended queue behavior.
  • (2026-05-05) F031 complete. Added server/src/lib/webhooks/rateLimitConfig.ts, registered the new 'webhook-out' namespace in initializeApp(), and replaced the stale delivery-count query in WebhookService.checkRateLimit() with TokenBucketRateLimiter.tryConsume('webhook-out', tenant, webhookId). The delivery path now applies the shared per-webhook bucket instead of the mocked webhook.rate_limit.enabled branch.
  • (2026-05-05) F033 complete. Added server/src/lib/eventBus/subscribers/webhook/webhookTicketPayload.ts, which builds the PRD's curated ticket snapshot for webhook fan-out, normalizes ticket.updated change diffs, includes ticket.comment.added comment metadata without attachments, resolves tags from tag_mappings, and caches the base (tenant,ticket_id) snapshot for 60 seconds so a multi-subscriber fan-out does not repeat the same joins.
  • (2026-05-05) F034 complete. ticket.status_changed payloads from webhookTicketPayload.ts now include previous_status_id plus a tenant-scoped lookup of previous_status_name, using either payload.previousStatusId or the older payload.changes.status_id.from compatible shape when deriving the prior status.
  • (2026-05-05) F035 complete. Added server/src/lib/eventBus/subscribers/webhookSubscriber.ts, which subscribes to the six v1 ticket events, builds the curated webhook payload once per internal event, filters subscribers by (tenant, public event type), and enqueues one delivery job per matching active webhook. I also introduced the initial server/src/lib/webhooks/WebhookDeliveryQueue.ts storage contract so the subscriber already targets the eventual Redis ZSET queue instead of a temporary inline-delivery path.
  • (2026-05-05) F036 complete. Registered the webhook subscriber in server/src/lib/eventBus/subscribers/index.ts so the existing register-all / unregister-all lifecycle now includes webhook ticket events alongside the other subscriber families.
  • (2026-05-05) F037 complete. Expanded server/src/lib/webhooks/WebhookDeliveryQueue.ts from storage-only enqueue support into the actual Redis ZSET poller: initialize(getRedisClient, processFn) now starts a 2s processing loop, claims ready jobs via zRangeByScore + zRem, limits active processor promises to 50, retries failed jobs up to five total attempts using the shared backoff helper, and drains in-flight work for up to 30 seconds on shutdown / SIGTERM.
  • (2026-05-05) F038 complete. initializeApp() now boots the webhook delivery queue with getRedisClient plus a real processWebhookDeliveryJob() callback, and the existing SIGTERM/SIGINT cleanup path now shuts the queue down alongside the email retry queues.
  • (2026-05-05) F040 complete. Added server/src/lib/webhooks/autoDisable.ts and wired it into processWebhookDeliveryJob(). Failed deliveries now advance the webhook's rolling stats, and once the first failure since the last success has aged past 24 hours the webhook is auto-disabled exactly once and the owning user receives a direct notification email via the system email service.
  • (2026-05-05) F052 complete. Updated the base webhook migration plus server/src/lib/webhooks/webhookModel.ts so webhook rows now persist and return event_filter JSON. That closes the storage gap under event_filter.entity_ids before the subscriber starts enforcing it.
  • (2026-05-05) F041 complete. webhookSubscriber.ts now enforces event_filter.entity_ids before enqueueing jobs: when a webhook row carries a non-empty allowlist, only matching ticket IDs are queued. Missing/empty allowlists still receive all matching event types.
  • (2026-05-05) F042 complete. ApiWebhookController.rotateSecret() now performs a real secret rotation: it generates a 32-byte base64url secret, updates the webhook through webhookModel.update(..., { signingSecret }), and returns the plaintext once in the response instead of the old timestamp stub.
  • (2026-05-05) F043 complete. ApiWebhookController.verifySignature() now resolves the signing secret from either webhook_id or secret_vault_path, normalizes split signature inputs into the t=...,v1=... header format when needed, and returns the real HMAC match result instead of the old always-true stub.
  • (2026-05-05) F044 complete. Replaced four controller TODOs: getDelivery() now loads a concrete webhook_deliveries row via webhookModel.getDeliveryById(), getHealth() derives a stable health summary from the webhook stats columns, getSubscriptions() returns the stored event_types for the webhook, and listEvents() returns the public enum from webhookEventTypeSchema.
  • (2026-05-05) F045 complete. Deleted the deferred webhook route handlers for transform/filter validation, system health, global/nested subscription creation, bulk/search/export, and manual event triggering. The nested [id]/subscriptions route now exposes only GET, and the removed paths will 404 instead of surfacing TODO-backed handlers.
  • (2026-05-05) F046 complete. ApiWebhookController.testById() now sends a real signed webhook.test request to the configured webhook URL, records the attempt in webhook_deliveries with is_test=true, and returns the observed transport result. It reuses the live signing/header and SSRF-guard path but skips the outbound rate-limit bucket and does not mutate webhook delivery stats.
  • (2026-05-05) F048 complete. Added server/src/services/cleanupWebhookDeliveriesJob.ts plus scheduler wiring in server/src/lib/jobs/index.ts and server/src/lib/jobs/initializeScheduledJobs.ts. The new system-wide job runs every 15 minutes and deletes webhook_deliveries rows older than 30 days in batches of 10,000 until the backlog is gone.
  • (2026-05-05) F049 complete. enforceApiRateLimit() now emits structured fallback metric logs for api_rate_limit_consumed_total, api_rate_limit_remaining, and api_rate_limit_redis_unavailable_total, using stable label fields (tenant, api_key_id, outcome) alongside the existing throttle WARN.
  • (2026-05-05) F050 complete. Added server/src/lib/webhooks/metrics.ts and wired structured fallback metric logs into WebhookDeliveryQueue, processWebhookDeliveryJob(), and maybeAutoDisable(). That now emits webhook_queue_depth, webhook_deliveries_total, webhook_delivery_duration_ms, and webhook_auto_disabled_total.
  • (2026-05-05) F051 complete. Added docs/api/api-rate-limiting-and-ticket-webhooks.md with the public rate-limit contract, webhook event examples, HMAC verification recipes, idempotency/ordering notes, and retry schedule; linked it from docs/api/api_overview.md.
  • (2026-05-06) F047 complete. Added tenant-authenticated webhook admin actions in packages/auth/src/actions/webhookActions.ts, a new AdminWebhooksSetup settings component with create/edit, test-send, secret rotation, pause/resume, delete, delivery history, and manual retry enqueue, plus Security settings tab wiring in server/src/components/settings/security/SecuritySettingsPage.tsx. The DAL now also exposes tenant-scoped webhook listing and paginated delivery history helpers for the UI.
  • (2026-05-06) T007 complete. Added server/src/test/integration/apiRateLimit.headers.test.ts, which drives the real ApiBaseController.list() auth path 121 times under one tenant/API key and asserts the 121st response is a 429 with Retry-After, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and the expected RATE_LIMITED error envelope details.
  • (2026-05-06) T008 complete. Extended server/src/test/integration/apiRateLimit.headers.test.ts with the success case assertion: an allowed authenticated request now proves X-RateLimit-Limit=120 and X-RateLimit-Remaining=119 are attached on the 200 response from the same controller path.
  • (2026-05-06) T009 complete. Extended the same apiRateLimit.headers.test.ts harness to swap API key identities within one tenant and prove bucket isolation: with a 5-token config, key A throttles on request 6 while key B still gets a 200 and its own remaining=4 header.
  • (2026-05-06) T010 complete. The same harness now also forces the tenant-scoped API-key auth branch via x-tenant-id and proves the bucket key includes tenant: exhausting tenant A with a shared api_key_id no longer affects tenant B, which still succeeds with its own remaining=4 header.
  • (2026-05-06) T011 complete. Extended the rate-limit integration harness to drive an exhausted bucket and then swap only the request pathname to a bypassed route (/api/v1/meta/health). Those calls stay 200 and the next ticket-path request is still 429, proving bypasses do not consume tokens.
  • (2026-05-06) T012 complete. Added observation-mode coverage to the same rate-limit integration file by mocking the shared logger: with RATE_LIMIT_ENFORCE=false, the 121st request now stays 200 with remaining=0, and the test asserts the structured throttle WARN still carries tenant, api_key_id, and retry_after_ms.
  • (2026-05-06) T013 complete. Added a broken-Redis branch to the same harness: 200 authenticated requests now stay 200 with X-RateLimit-Remaining=-1, and the mocked logger proves the api_rate_limit_redis_unavailable_total metric payload is emitted on the fail-open path.
  • (2026-05-06) T014 complete. The rate-limit harness now also imports the shared middleware wrappers and proves bucket sharing across all three auth surfaces: after five mixed requests through ApiBaseController, withApiKeyAuth, and withAuth, the next request on each surface returns 429 from the same (tenant, api_key_id) bucket.
  • (2026-05-06) T015 complete. Added NM Store coverage to the same integration file by mocking getAppSecret('nm_store_api_key'): the withApiKeyAuth({ allowNmStore: true }) branch now throttles the shared sentinel bucket after five requests, while a normal API key in the same tenant still succeeds with its own remaining=4 header.