Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
47 KiB
47 KiB
Scratchpad — API Rate Limiting and Outbound Ticket Webhooks
- Plan slug:
api-rate-limiting-and-ticket-webhooks - Created:
2026-05-05 - Source plans (kept for diff/history; this folder is canonical going forward):
/Users/natalliabukhtsik/Desktop/projects/alga-psa/.ai/api-rate-limiting-plan.md/Users/natalliabukhtsik/Desktop/projects/alga-psa/.ai/ticket-webhooks-plan.md
What This Is
Rolling notes for the combined effort. Append decisions and discoveries as implementation progresses; update earlier entries when something changes.
Decisions
- (2026-05-05) Combined into one plan. The two source plans share
infrastructure (
TokenBucketRateLimiternamespace work — features F001–F005 — must land before either feature can use namespaced buckets). Splitting the features into one plan avoids re-stating the foundation. - (2026-05-05) Queue: Redis ZSET, not BullMQ or Temporal. BullMQ is not a
current dependency; adding it would introduce a third queue paradigm.
Temporal is in use for
workflow-workerbut webhook delivery is "POST + retry," not multi-step. TheDelayedEmailQueueZSET pattern (packages/email/src/DelayedEmailQueue.ts) is the closest analog and reuses the existing Redis client. User confirmed this on 2026-05-05. - (2026-05-05) Signing secret stored via secret provider, not hashed.
HMAC requires the plaintext on every delivery — hashing breaks signing.
Mirror the Stripe integration (
webhook_secret_vault_pathcolumn, resolved throughgetSecretProviderInstance()). Fixed during plan review. - (2026-05-05) Reuse
TooManyRequestsError, don't add a parallelRateLimitError. It already exists atapiMiddleware.ts:101-111with the right shape. Plumb headers throughApiError.headersinstead. - (2026-05-05) Subscribe to
TICKET_STATUS_CHANGEDdirectly. It's a first-class internal event (eventBusSchema.ts:170) — don't synthesize it fromTICKET_UPDATED.changes.status_id. - (2026-05-05) Three auth surfaces, one helper.
enforceApiRateLimit(req, ctx)is called fromApiBaseController.authenticate,withApiKeyAuth(both branches), andwithAuth. NM Store path uses sentinel subjectId'nm_store'since it has noapiKeyId. - (2026-05-05) Defer to v2 by removing routes, not by leaving 501s.
Discovered 14+ TODO stubs in
ApiWebhookController. The deferred ones (transformations, bulk ops, templates marketplace, etc.) get their route files deleted so OpenAPI doesn't advertise them. - (2026-05-05) Rate-limiter and webhooks share the
TokenBucketRateLimiternamespace work. The webhook per-webhook outbound cap (namespace'webhook-out') depends on F001–F005 being merged first. - (2026-05-06) Place the v1 webhook admin UI under Security, next to API Keys.
The open question remained unresolved, and the existing
/msp/security-settingssurface already hosts the external API admin controls. Reusing that location avoids inventing a second admin-only settings entry point during the MVP.
Discoveries / Constraints
- (2026-05-05)
TokenBucketRateLimiteris atpackages/email/src/TokenBucketRateLimiter.ts. Bucket key prefix isalga-psa:ratelimit:bucket:and TTL is 3600s. TheBucketConfigGettersignature is(tenantId) => BucketConfig— must widen to(tenantId, subjectId?) => BucketConfigfor per-key/per-webhook overrides. - (2026-05-05) Existing email rate-limit defaults are
maxTokens=60, refillRate=1. New API defaults are deliberately higher (120, 1) — API bursts are expected to be larger than email bursts. - (2026-05-05)
WebhookService.checkRateLimit(line 1056) querieswebhook_deliveries, which doesn't exist yet — it would throw if called. Latent bug: nothing currently calls into the delivery path. - (2026-05-05)
WebhookService.performWebhookDelivery(line 950) is mocked — sleeps 100 ms and returns{ success: true, status_code: 200 }. No real HTTP request happens today. - (2026-05-05)
webhookEventTypeSchemalacksticket.comment.added. F023 must extend the enum or webhook creation requests for that event type fail validation. - (2026-05-05) Existing distribution pattern for tenant-scoped tables:
notification_settingsis in20250805000019_distribute_final_tables.cjs. Migration extension is.cjs, not.ts. Citus distribution lives inee/server/migrations/citus/, separate from the create migration inserver/migrations/. - (2026-05-05) PostgreSQL
UNIQUE (tenant, api_key_id)would allow multiple(tenant, NULL)tenant-default rows. The migration needs a separate unique partial index ontenant WHERE api_key_id IS NULLto make the null fallback row actually unique. - (2026-05-05) The current secret-provider API resolves tenant secrets by
(tenant, secretName), not by an arbitrary vault path. For webhook signing secrets,signing_secret_vault_paththerefore acts as stored metadata; the DAL resolves the actual secret by taking the basename of the stored path and callinggetTenantSecret(tenant, basename(path)). - (2026-05-05)
undiciis already available in the server runtime, so the real webhook transport can useundici.fetch+Agentfor theverify_ssl=falsepath without introducing a new dependency. - (2026-05-05) Node's
net.BlockListis sufficient for the required SSRF address classes. The helper now blocks RFC1918, loopback, link-local, and CGNAT IPv4 ranges plus::1andfe80::/10, and it short-circuits all of those checks whenWEBHOOK_SSRF_ALLOW_PRIVATE=true. - (2026-05-05) The repo still had an older generic webhook validator that
expected
sha256=<hex>. F030 replaces that with the PRD-specific outbound formatt=<unix>,v1=<hex>and routes the leftover schema helper through the new shared implementation so future controller work doesn't split the signature recipe again. - (2026-05-05) The ticket webhook surface now has a single canonical
translation layer under
eventBus/subscribers/webhook/; future subscriber fan-out code can map one internal event to one or more public webhook events without duplicating string switches. - (2026-05-05) The placeholder retry math in
WebhookServicewas still using generic exponential/linear config fields. F039 replaces that with the PRD's fixed retry cadence and exposes it as a shared helper for the future Redis queue worker. - (2026-05-05)
initializeApp.tsis a poortsxsmoke-import target in this repo because importing the full app graph pulls Next/UI assets likereact-day-picker/src/style.css. For F031 validation, focused imports of the new rate-limit getter and the touched service file are the useful checks. - (2026-05-05)
ApiBaseController.authenticateis not the universal hook point —withApiKeyAuthandwithAuthinapiMiddleware.ts:144,201are independent paths, and the NM Store branch inwithApiKeyAuthproduces a context withapiKeyId === undefined. Verified by reading service-types and test-auth routes. - (2026-05-05)
/api/v1/test-authdoes not use the samewithApiKeyAuthhelper asservice-types; it goes through the olderserver/src/lib/api/middleware/apiAuthMiddleware.ts. Rate-limit wiring has to cover that legacy wrapper too or the planned cross-surface test would split buckets by middleware implementation. - (2026-05-05) Several
/api/v1route families still bypassed the three shared auth surfaces even after F018: asset routes and contract-line routes were calling controllers that expectreq.contextbut never authenticated, and a handful of direct route handlers (tickets/priorities,tickets/statuses, ticket comment reactions, storage routes, and several mobile moderation/push/account routes) were validating API keys inline without invoking the limiter. - (2026-05-05) Internal event vocabulary is much larger than the v1 public
surface.
TICKET_REOPENED,TICKET_ESCALATED,TICKET_PRIORITY_CHANGED,TICKET_UNASSIGNED,TICKET_QUEUE_CHANGED,TICKET_TAGS_CHANGED,TICKET_RESPONSE_STATE_CHANGED,TICKET_ADDITIONAL_AGENT_ASSIGNEDexist inEVENT_TYPESbut are deferred to v2 (rolled intoticket.updated). - (2026-05-05)
TICKET_COMMENT_ADDEDcurrently reaches subscribers through the legacyTicketEventPayloadSchemashape fromTicketService: it includespayload.comment.{content,author,isInternal}but not a persisted comment timestamp. The webhook payload builder therefore usespayload.occurredAt/ event timestamp forcomment.timestamp. - (2026-05-05)
TICKET_STATUS_CHANGEDpayloads may arrive in either the new domain shape (previousStatusId) or an olderchanges.status_id.fromstyle. The webhook payload builder now accepts both so subscriber output stays stable across publishers while the event vocabulary converges. - (2026-05-05)
webhookSubscriber.tsneeds a queue boundary before the full poller lands. I addedWebhookDeliveryQueue.enqueue()as the initial Redis storage contract now, and the later F037 work will extend that same class with claim/process/retry behavior instead of swapping subscriber behavior. - (2026-05-05) Importing
server/src/lib/eventBus/subscribers/index.tsthroughtsxdrags a large app/UI graph and currently trips the same unrelatedreact-day-picker/src/style.cssloader issue seen with broadinitializeAppsmoke imports. The narrowerwebhookSubscriber.tsmodule import remains the useful compile smoke for webhook subscriber changes. - (2026-05-05)
ApiWebhookController.tsimports can hit that same broad.cssloader issue undertsx. For controller TODO replacements, the narrower DAL/helper module smokes plusgit diff --checkare the reliable local validation path unless we run the full server test suite. - (2026-05-05) The webhook signature-verify route now supports both the plan's
direct
secret_vault_pathinput and a saferwebhook_idlookup. Both paths resolve to the same tenant secret provider and use the sharedverifyWebhookSignature()helper after normalizing the header format. - (2026-05-05) The remaining read-side webhook controller stubs can stay thin:
delivery details come straight from
webhook_deliveries, health derives from the webhook stats columns already maintained by the delivery processor, subscriptions are justwebhook.event_types, and available events come fromwebhookEventTypeSchema.options. - (2026-05-05) Deferred webhook TODOs are now route-level cleanup, not controller cleanup. The implemented surface keeps nested delivery/health/ subscriptions reads plus create/list/test/verify, and drops the transform, filter, validate, bulk, search, export, trigger, and system-health routes so they naturally 404 instead of advertising dead handlers.
- (2026-05-05) The nested webhook test route now diverges from the older
generic
/api/v1/webhooks/testhelper:/[id]/testalways uses the stored webhook URL + live signing secret, emitsevent_type='webhook.test', recordsis_test=true, and intentionally skips outbound bucket consumption. - (2026-05-05) Broad imports through
server/src/lib/jobs/index.tsalso hit the same unrelatedreact-day-pickerCSS loader issue undertsx, so the cleanup-job service module is the reliable smoke target for scheduled-job additions in this environment. - (2026-05-05) I could not find a dedicated operational metrics client/facade in this repo. For the v1 observability items, the fallback is structured log emission with stable metric names/labels rather than a Prometheus/StatsD sink.
- (2026-05-05) Webhook observability follows that same fallback pattern: queue depth is emitted from the Redis ZSET wrapper, delivery totals and durations from the delivery processor, and auto-disable counts from the state transition helper.
- (2026-05-05) Public docs for this plan now live at
docs/api/api-rate-limiting-and-ticket-webhooks.mdand are linked fromdocs/api/api_overview.md. - (2026-05-05)
WebhookDeliveryQueuenow owns the retry loop contract: processors now return explicitdelivered/retry/abandonedoutcomes. The queue handles atomiczRemclaims, caps active work at 50 in-process jobs, and re-enqueues attempts 2..5 withcomputeBackoff(attempt). - (2026-05-05) Auto-disable must follow a continuous failure streak, not just
"some failures in the last day."
maybeAutoDisable()therefore keys off the first non-delivered attempt sincelast_success_atand disables only once that streak has remained all-failure for 24 hours. - (2026-05-05) Added feature
F052after discovering a plan/code mismatch:webhookSchemas.tsalready exposedevent_filter.entity_ids, but thewebhookstable migration andwebhookModelnever persistedevent_filterat all. The subscriber-side entity filter needs that durable field first. - (2026-05-05) The v1 subscriber filter stops at
event_filter.entity_ids. Genericconditions,tags, andentity_typesremain schema-only for now per the PRD; the enqueue path simply treats an empty/missingentity_idslist as "match all." - (2026-05-06) The new webhook settings tab uses tenant-authenticated server
actions instead of the standalone
/api/v1/webhookscontroller surface. That keeps the admin UI on the same auth model as the rest of settings while still reusing the shared DAL, delivery transport, signing helper, and queue. - (2026-05-06)
tsximport-smoke of the new client component still trips the repo's unrelatedreact-day-picker/src/style.cssloader issue. The focused validation path for F047 is thereforegit diff --checkplus a direct smoke import of the new server-action module, matching the earlier UI validation limitation already documented for this repo. - (2026-05-06) The first API rate-limit integration harness now uses a minimal
ApiBaseControllersubclass plus mocked auth/RBAC/data-service edges. That keepsT007focused on the shared authenticate/throttle/response path without having to pull the full tickets stack or a database-backed route into the fixture. - (2026-05-06)
T016exercises the per-key override path by spying onapiRateLimitSettingsReadOps.getForKeyand wiring the bucket to the realapiRateLimitConfigGetter. Both the limit-header lookup and the bucket's internal lookup share the same in-process cache, so a single seeded row drives both consumption (tryConsume) and theX-RateLimit-Limitvalue emitted on every response — no additional fixture is required. - (2026-05-06)
T017covers the rate-limit server-action contract at the cache + DAL seam rather than through thewithAuthwrapper. The session machinery used bysetApiRateLimitForKey/clearApiRateLimitForKey(getCurrentUser,getUserRoles,assertApiKeyExists) is session-coupled and out of scope for vitest in this repo; the load-bearing assertion ("subsequent enforce call sees new limit immediately, not after 30s") lives in theinvalidateApiRateLimitConfigstep the actions perform after each upsert/clear, so the test simulates that exact write+invalidate sequence and verifies the bucket honours the new limit on the very nexttryConsume. - (2026-05-06) Reusable webhook delivery test fixture: in-memory mock Redis
implementing
RedisClientLikewith full ZSET semantics (zAdd/zRem/zRangeByScore/zCard) plus an ephemeralnode:httpstub server keyed offWEBHOOK_SSRF_ALLOW_PRIVATE=true. The webhook model + autoDisable are mocked at the module boundary and the queue is given a 999_999 mscheckIntervalMsso thesetIntervalpoller never races a manualqueue.process()call.(WebhookDeliveryQueue as any).instance = nullis required between tests to reset the singleton; the public API has noresetInstance. - (2026-05-06)
T026exercises tenant isolation at the subscriber/event-bus seam without standing up a real Redis-backed event bus: mocking@/lib/eventBusto a stub that recordssubscribe()callbacks lets the test invoke the captured handler directly with a forgedTICKET_ASSIGNEDevent.webhookModel.listForEventTypethen proves the query is scoped to the publishing tenant andWebhookDeliveryQueue.enqueueis spied to verify only the matching-tenant webhook gets a job. - (2026-05-06)
T027skipsvi.useFakeTimersin favour of fast-forwarding the mock-Redis ZSET scores between iterations (fastForwardAll()); the queue's claim/process cycle is what matters and it's already deterministic oncecheckIntervalMsis set to999_999. A smallwaitForpolling helper drains in-flight deliveries between attempts. This keeps the retry-cadence assertion ("score equalsnow + computeBackoff(attempt)") honest without the cross-test contamination fake timers tend to introduce. - (2026-05-06)
T030simulates two pods racing on the same job by initializing twoWebhookDeliveryQueueinstances against the same shared mock Redis (clear(WebhookDeliveryQueue as any).instance = nullbetween the twogetInstance()calls). A custom processor spy passed toinitialize()lets the test assert "exactly one of the two workers ran the processor" without spinning up the full delivery stack. - (2026-05-06)
T031mocksundiciat the package boundary soassertSafeWebhookTargetcan be exercised end-to-end throughperformWebhookDeliveryRequest. The blocked path proves the SSRF guard fires beforefetchis reached (spy unused) and returnserror_type='ssrf'; the bypassed path proves the override path lets the fetch through. TheAgentconstructor is mocked alongsidefetchsoverify_ssl=falsepaths don't touch the real undici Agent. - (2026-05-06)
T036cannot usevi.spyOn(ApiBaseController.prototype, ...)becauseApiWebhookControllerdeclares its OWNprivateauthenticateandcheckPermissionthat shadow the base class — the spy must be onApiWebhookController.prototypedirectly. URLs in controller tests must use a real UUID for the[id]segment becauseextractIdFromPathvalidates against^[0-9a-f]{8}-...$. - (2026-05-06)
T037audits the migration source files instead of spinning up a Citus-aware test database, since the vitest harness here doesn't have Citus available. The audit verifies the table-creation + partial unique-index + distribute_table contracts that real migrations enforce; if/when a Citus test DB lands, this test should be replaced with a realmigrate:upsmoke + apg_dist_partitionquery.
Commands / Runbooks
- (2026-05-05) Run a single integration test:
cd server && npx vitest run src/test/integration/apiRateLimit.headers.test.ts - (2026-05-05) Run all webhook integration tests:
cd server && npx vitest run src/test/integration/webhook* - (2026-05-05) Run unit tests for the rate limiter package:
cd packages/email && npx vitest run src/__tests__/TokenBucketRateLimiter* - (2026-05-05) Apply migrations against a local dev database — see existing
migrate flow in
server/package.json(knex CLI driven bymigrations/andee/server/migrations/citus/). - (2026-05-05) Toggle observation mode locally:
RATE_LIMIT_ENFORCE=falseinserver/.env. Toggle SSRF bypass for staging:WEBHOOK_SSRF_ALLOW_PRIVATE=true. - (2026-05-05) Tail Redis bucket state during integration tests:
redis-cli --scan --pattern 'alga-psa:ratelimit:bucket:*' | xargs -L1 redis-cli get - (2026-05-05) Run the namespace foundation unit suite without coverage noise:
cd server && npx vitest run --coverage.enabled=false src/test/unit/notifications/tokenBucketRateLimiter.test.ts ../packages/email/src/__tests__/TokenBucketRateLimiter.namespaces.test.ts ../packages/email/src/__tests__/TokenBucketRateLimiter.subjectId.test.ts ../packages/email/src/__tests__/TokenBucketRateLimiter.email-regression.test.ts - (2026-05-05) Run the API response-header unit test:
cd server && npx vitest run --coverage.enabled=false src/test/unit/api/apiMiddleware.responseHeaders.test.ts - (2026-05-05) Run the API rate-limit config getter unit tests:
cd server && npx vitest run --coverage.enabled=false src/lib/api/rateLimit/__tests__/configGetter.cache.test.ts src/lib/api/rateLimit/__tests__/configGetter.invalidate.test.ts src/lib/api/rateLimit/__tests__/configGetter.fallback.test.ts - (2026-05-05) Run the API rate-limit enforcement helper tests:
cd server && npx vitest run --coverage.enabled=false src/lib/api/rateLimit/__tests__/enforce.test.ts src/test/unit/api/apiMiddleware.responseHeaders.test.ts - (2026-05-05) Smoke-load the webhook payload builder:
cd server && npx tsx -e "import('./src/lib/eventBus/subscribers/webhook/webhookTicketPayload.ts').then(() => console.log('payload-ok'))" - (2026-05-05) Smoke-load the webhook subscriber + queue storage layer:
cd server && npx tsx -e "import('./src/lib/webhooks/processWebhookDeliveryJob.ts').then(() => console.log('processor-ok'))"cd server && npx tsx -e "import('./src/lib/webhooks/autoDisable.ts').then(() => console.log('auto-disable-ok'))"cd server && npx tsx -e "import('./src/lib/webhooks/WebhookDeliveryQueue.ts').then(() => console.log('queue-ok'))"cd server && npx tsx -e "import('./src/lib/eventBus/subscribers/webhookSubscriber.ts').then(() => console.log('subscriber-ok'))" - (2026-05-05)
cd server && npx tsc --noEmit --pretty falsecurrently OOMs in this repo, and even targetedtscentrypoint checks surface existing package-resolution / JSX-config errors unrelated to this feature slice, so compile verification here is limited to focused runtime/unit checks plus manual review. - (2026-05-05) Smoke-import the webhook DAL after edits:
cd server && npx tsx -e "import('./src/lib/webhooks/webhookModel.ts').then(() => console.log('ok'))" - (2026-05-05) Smoke-import the webhook delivery transport after edits:
cd server && npx tsx -e "import('./src/lib/webhooks/delivery.ts').then(() => console.log('delivery-ok'))" - (2026-05-06) Smoke-import the webhook admin server actions after edits:
npx tsx -e "import('./packages/auth/src/actions/webhookActions.ts').then(() => console.log('webhook-actions-ok'))" - (2026-05-05) Quick SSRF helper smoke:
cd server && npx tsx -e "import('./src/lib/webhooks/ssrf.ts').then(async ({ assertSafeWebhookTarget }) => { await assertSafeWebhookTarget('https://example.com'); console.log('public-ok'); try { await assertSafeWebhookTarget('http://127.0.0.1'); process.exit(1); } catch (error) { console.log((error && error.name) || 'error'); } })" - (2026-05-05) Quick signing helper smoke:
cd server && npx tsx -e "import('./src/lib/webhooks/sign.ts').then(({ signRequest, verifyWebhookSignature }) => { const header = signRequest('shh', '{\\\"a\\\":1}', 1700000000); console.log(header); console.log(verifyWebhookSignature(header, '{\\\"a\\\":1}', 'shh')); })" - (2026-05-05) Quick event-map smoke:
cd server && npx tsx -e "import('./src/lib/eventBus/subscribers/webhook/webhookEventMap.ts').then(({ publicEventsFor }) => { console.log(publicEventsFor('TICKET_ASSIGNED').join(',')); console.log(publicEventsFor('NOPE').length); })" - (2026-05-05) Quick backoff helper smoke:
cd server && npx tsx -e "import('./src/lib/webhooks/backoff.ts').then(({ computeBackoff }) => { console.log([1,2,3,4,5].map(computeBackoff).join(',')); })" - (2026-05-05) Quick webhook rate-limit getter smoke:
cd server && npx tsx -e "import('./src/lib/webhooks/rateLimitConfig.ts').then(({ DEFAULT_WEBHOOK_RATE_LIMIT_PER_MIN }) => console.log(DEFAULT_WEBHOOK_RATE_LIMIT_PER_MIN))"
Links / References
- Source plans:
.ai/api-rate-limiting-plan.md.ai/ticket-webhooks-plan.md
- Key files:
packages/email/src/TokenBucketRateLimiter.ts— bucket implementation.packages/email/src/DelayedEmailQueue.ts— pattern forWebhookDeliveryQueue.server/src/lib/initializeApp.ts:144-168— singleton init site.server/src/lib/api/controllers/ApiBaseController.ts:44-87— auth surface 1.server/src/lib/api/middleware/apiMiddleware.ts:101-111—TooManyRequestsError; lines 144 & 201 — auth surfaces 2 & 3.server/src/lib/api/services/WebhookService.ts:950, 1056— mock + broken rate limit.server/src/lib/api/controllers/ApiWebhookController.ts— 14+ TODOs.packages/event-schemas/src/schemas/eventBusSchema.ts:157-184— internalEVENT_TYPES.server/src/lib/api/schemas/webhookSchemas.ts:21-60— public enum to extend.ee/server/migrations/20251014120000_create_stripe_integration_tables.cjs:28—webhook_secret_vault_pathprecedent.server/src/lib/webhooks/webhookModel.ts— tenant-scoped webhook DAL and signing-secret resolution helpers.server/src/lib/webhooks/delivery.ts— shared outbound HTTP transport for webhook delivery with timeout/TLS/error classification.server/src/lib/webhooks/ssrf.ts— outbound target validation for webhook delivery and test-send flows.server/src/lib/webhooks/sign.ts— outbound request signing and signature verification helper for webhook deliveries.server/src/lib/eventBus/subscribers/webhook/webhookEventMap.ts— canonical mapping from internal ticket events to public webhook events.server/src/lib/webhooks/backoff.ts— shared retry schedule helper for the outbound webhook queue.server/src/lib/webhooks/rateLimitConfig.ts— shared token-bucket config getter for thewebhook-outnamespace.
Open Questions
- (2026-05-05) IA placement of the new admin UIs — Settings → Security or Settings → Integrations? Confirm with design before F022/F047 lands.
- (2026-05-05) Per-tenant cap on top of per-key buckets? Defer until Stage 1 observation data justifies it.
- (2026-05-05) Per-endpoint cost weights (search costs more than get)? Defer until observation data shows pressure differences.
- (2026-05-05) Expose
ticket.deletedin v1? Decision: defer unless the noisy poller specifically asks during migration. - (2026-05-05) Per-tenant webhook count cap — proposed 50; confirm before F047 lands.
Progress Log
- (2026-05-05) F001 complete.
TokenBucketRateLimiternow requires an explicitnamespaceontryConsume,getState,getBucketKey, andgetBucketConfig. Redis keys now include the namespace segment (alga-psa:ratelimit:bucket:{namespace}:{tenant}[:{subject}]) so future API/webhook buckets cannot collide with the existing email path. - (2026-05-05) F002 complete.
BucketConfigGetternow receives(tenantId, subjectId?), which lets the limiter surface per-key and per-webhook configuration decisions without additional key parsing. - (2026-05-05) F003 complete.
TokenBucketRateLimiter.initialize()now accepts a namespace-to-getter map, and lookup/fail-open behavior stays centralized inside the shared limiter instead of spreading per-namespace branching to callers. - (2026-05-05) F004 complete.
initializeApp()now registers the existing email tenant-config getter under namespaceemailand a temporary hard-coded API getter under namespaceapi, so startup is already wired for the upcoming API limiter without altering email defaults. - (2026-05-05) F005 complete.
TenantEmailService.checkRateLimits()now consumes tokens from namespaceemail, preserving the pre-existing per-tenant/per-user email semantics after the limiter API change. - (2026-05-05) T001 complete. Added
packages/email/src/__tests__/TokenBucketRateLimiter.namespaces.test.tsto prove the same tenant/subject can exhaustemailwithout consuming theapibucket. - (2026-05-05) T002 complete. Added
packages/email/src/__tests__/TokenBucketRateLimiter.subjectId.test.tsto verify namespace getters receivesubjectIdand that API-key buckets are keyed as...:api:{tenant}:{subject}. - (2026-05-05) T003 complete. Added
packages/email/src/__tests__/TokenBucketRateLimiter.email-regression.test.tswith fake time pinned to confirm the email namespace preserves the legacy 60-token burst / 1-token-per-second refill behavior at calls 1, 30, 60, and 61. - (2026-05-05) F006 complete.
ApiErrornow supports optional response headers andhandleApiError()forwards them intoNextResponse.json(), which lets later rate-limit errors attachRetry-AfterandX-RateLimit-*metadata without a parallel error class. - (2026-05-05) F007 complete.
createSuccessResponse()andcreatePaginatedResponse()now accept optionalextraHeadersas a final parameter, preserving existing controller call sites while opening a clean path for rate-limit headers on successful responses. - (2026-05-05) F008 complete. Added
server/migrations/20260505123000_create_api_rate_limit_settings.cjswith tenant-scoped rate-limit columns plus separate unique indexes for per-key rows and the(tenant, NULL)tenant default row. - (2026-05-05) F009 complete. Added
ee/server/migrations/citus/20260505123100_distribute_api_rate_limit_settings.cjsso the new settings table is distributed ontenantwhen Citus is present. - (2026-05-05) F010 complete. Added
server/src/lib/api/rateLimit/apiRateLimitSettingsModel.tswith exact-row reads/writes plus a fallback resolver that checks(tenant, apiKeyId), then(tenant, NULL), then the hard defaults{ maxTokens: 120, refillRate: 1 }. - (2026-05-05) F011 complete. Added
server/src/lib/api/rateLimit/apiRateLimitConfigGetter.tswith a 1000-entry, 30-second TTL cache, exact-entry invalidation, tenant-prefix invalidation, andinitializeApp()now uses it for theapinamespace. - (2026-05-05) T004 complete. Added
server/src/lib/api/rateLimit/__tests__/configGetter.cache.test.tsto verify identical cached lookups hit the settings resolver once. - (2026-05-05) T005 complete. Added
server/src/lib/api/rateLimit/__tests__/configGetter.invalidate.test.tsto prove tenant-wide invalidation clears only that tenant and single-key invalidation clears only the targeted key. - (2026-05-05) T006 complete. Added
server/src/lib/api/rateLimit/__tests__/configGetter.fallback.test.tsto verify the resolver order is per-key override, then tenant default, then the hard-coded API defaults. - (2026-05-05) F012 complete. Added
server/src/lib/api/rateLimit/enforce.tsas the shared API limiter entry point. It resolves theapinamespace bucket, skips configured bypass paths, computes rate-limit header values, and either throwsTooManyRequestsErroror returns aRateLimitDecision. - (2026-05-05) F013 complete.
enforceApiRateLimit()now treatsRATE_LIMIT_ENFORCE=falseas observation mode: it logs the throttle with tenant/api-key/retry metadata and returns a decision instead of throwing. - (2026-05-05) F014 complete. The NM Store branch in
apiMiddleware.withApiKeyAuth()now stampsrateLimitSubjectId='nm_store'before calling the limiter so all global-key traffic shares one tenant bucket instead of bypassing per-subject accounting. - (2026-05-05) F015 complete.
shouldBypassRateLimit()now centralizes the bypass prefixes for health endpoints, mobile auth, and runner-internal endpoints so future auth wrappers reuse one rate-limit allowlist. - (2026-05-05) F016 complete. Rate-limit denials now throw the existing
TooManyRequestsErrorwithdetails.retry_after_ms,details.remaining, and the full header set attached onerror.headers. - (2026-05-05) F017 complete.
ApiBaseController.authenticate()now enforces the API bucket immediately after building request context and stores the resulting decision onapiRequest.context.rateLimit. - (2026-05-05) F018 complete. The middleware auth wrappers now call
enforceApiRateLimit()as soon as context is available. I also wired the legacyapiAuthMiddleware.tspath so/api/v1/test-authstays in the same bucket family as the newer wrappers. - (2026-05-05) F019 complete.
createSuccessResponse()andcreatePaginatedResponse()now emitX-RateLimit-LimitandX-RateLimit-Remainingautomatically when the passed request carriescontext.rateLimit, and the genericApiBaseControllercreate/update paths now passapiRequestthrough to the helper. - (2026-05-05) F020 complete. Added reusable legacy auth helpers:
authenticateApiKeyRequest()for inline API-key handlers,withApiKeyRouteAuth()for route files that needreq.context, andappendRateLimitHeaders()for directNextResponseroutes. Wrapped the entire asset and contract-line/api/v1route families so they now authenticate through the shared legacy middleware and emit rate-limit headers. I also migrated the remaining direct/api/v1handlers that were doing inline API-key validation (ticket priorities/statuses/reactions, storage routes, and the non-mobile-auth mobile moderation/push/account routes) onto the shared helper so they consume the sameapibucket. - (2026-05-05) F021 complete. Added tenant-admin server actions in
packages/auth/src/actions/apiKeyRateLimitActions.ts:getApiRateLimitForKey,setApiRateLimitForKey,setTenantDefaultApiRateLimit, andclearApiRateLimitForKey. They verify admin access, scope API key IDs to the current tenant, use theapi_rate_limit_settingsmodel for reads/writes, and invalidate the in-process API rate-limit config cache immediately after every write so UI updates do not wait on the 30s TTL. - (2026-05-05) F022 complete.
AdminApiKeysSetupnow loads each key's effective API rate-limit settings plus live bucket state and renders a new "Rate Limit" column with inline override editing and reset. The column shows the effective burst / refill values, the config source (per-key override vs tenant default vs hard default), and the current remaining tokens fromTokenBucketRateLimiter.getState('api', tenant, apiKeyId). - (2026-05-05) F023 complete. The public webhook event enum now includes
ticket.comment.added, so webhook create/update validation no longer rejects the v1 ticket-comment subscription event. - (2026-05-05) T018 complete. Added
server/src/lib/api/schemas/__tests__/webhookSchemas.test.tsto lock in acceptance of the newticket.comment.addedenum member. - (2026-05-05) F024 complete. Added
server/migrations/20260505140000_create_webhook_tables.cjswith the basewebhookssubscription table: tenant-scoped primary key, event list, signing-secret vault path, retry/rate-limit config, activation flag, rolling delivery stats, auto-disable timestamp, and creator/audit timestamps. - (2026-05-05) F025 complete. Expanded the same webhook migration to add
webhook_deliverieswith tenant/webhook foreign key wiring, request + response capture columns, retry scheduling fields,is_test, and the three queue-oriented indexes required by the PRD (webhook+attempted_at,event_id, and partial pending/retryingnext_retry_at). - (2026-05-05) F026 complete. Added
ee/server/migrations/citus/20260505140100_distribute_webhook_tables.cjsto distribute bothwebhooksandwebhook_deliveriesontenant, with the same Citus-enabled / already-distributed guards used by the earlier rate-limit distribution migration. - (2026-05-05) F027 complete. Added
server/src/lib/webhooks/webhookModel.tsas the first non-mock webhook foundation: public reads omitsigning_secret_vault_path, inserts persist signing secrets viagetSecretProviderInstance(), delivery attempts write towebhook_deliveries, stats updates increment the rolling counters onwebhooks, andgetSigningSecret()resolves the stored path-style reference back to the tenant secret name. - (2026-05-05) F028 complete. Added
server/src/lib/webhooks/delivery.tsand rewiredWebhookService.performWebhookDelivery()to use it. Deliveries now perform a realundici.fetchcall with a 10s timeout, preserve response status and headers, truncate stored response bodies to 8 KB, classify DNS/connect/TLS/ timeout failures, and disable certificate verification only whenverify_ssl=false. - (2026-05-05) F029 complete. Added
server/src/lib/webhooks/ssrf.tsand enforced it in the shared delivery transport before any outbound fetch. Targets must now usehttp(s), rejectlocalhost/loopback/private/link-local/CGNAT destinations after DNS resolution, and only bypass those checks whenWEBHOOK_SSRF_ALLOW_PRIVATE=true. - (2026-05-05) F030 complete. Added
server/src/lib/webhooks/sign.tswith the PRD'sX-Alga-Signaturecontract:t=<timestamp>,v1=<sha256 hex>over${timestamp}.${body}.webhookSchemas.validateWebhookSignature()now delegates to the same helper instead of preserving the oldsha256=<hex>comparison logic. - (2026-05-05) F032 complete. Added
server/src/lib/eventBus/subscribers/webhook/webhookEventMap.tswith the v1 ticket-event translation table and apublicEventsFor()helper that returns a fresh array for each lookup, making the mapping ready for the upcoming event-bus subscriber. - (2026-05-05) F039 complete. Added
server/src/lib/webhooks/backoff.tswith the PRD retry schedule (1m, 5m, 30m, 2h, 12h) and pointed the scaffoldedWebhookService.calculateNextRetryTime()method at that helper so old placeholder retry math no longer diverges from the intended queue behavior. - (2026-05-05) F031 complete. Added
server/src/lib/webhooks/rateLimitConfig.ts, registered the new'webhook-out'namespace ininitializeApp(), and replaced the stale delivery-count query inWebhookService.checkRateLimit()withTokenBucketRateLimiter.tryConsume('webhook-out', tenant, webhookId). The delivery path now applies the shared per-webhook bucket instead of the mockedwebhook.rate_limit.enabledbranch. - (2026-05-05) F033 complete. Added
server/src/lib/eventBus/subscribers/webhook/webhookTicketPayload.ts, which builds the PRD's curated ticket snapshot for webhook fan-out, normalizesticket.updatedchange diffs, includesticket.comment.addedcomment metadata without attachments, resolves tags fromtag_mappings, and caches the base(tenant,ticket_id)snapshot for 60 seconds so a multi-subscriber fan-out does not repeat the same joins. - (2026-05-05) F034 complete.
ticket.status_changedpayloads fromwebhookTicketPayload.tsnow includeprevious_status_idplus a tenant-scoped lookup ofprevious_status_name, using eitherpayload.previousStatusIdor the olderpayload.changes.status_id.fromcompatible shape when deriving the prior status. - (2026-05-05) F035 complete. Added
server/src/lib/eventBus/subscribers/webhookSubscriber.ts, which subscribes to the six v1 ticket events, builds the curated webhook payload once per internal event, filters subscribers by(tenant, public event type), and enqueues one delivery job per matching active webhook. I also introduced the initialserver/src/lib/webhooks/WebhookDeliveryQueue.tsstorage contract so the subscriber already targets the eventual Redis ZSET queue instead of a temporary inline-delivery path. - (2026-05-05) F036 complete. Registered the webhook subscriber in
server/src/lib/eventBus/subscribers/index.tsso the existing register-all / unregister-all lifecycle now includes webhook ticket events alongside the other subscriber families. - (2026-05-05) F037 complete. Expanded
server/src/lib/webhooks/WebhookDeliveryQueue.tsfrom storage-only enqueue support into the actual Redis ZSET poller:initialize(getRedisClient, processFn)now starts a 2s processing loop, claims ready jobs viazRangeByScore+zRem, limits active processor promises to 50, retries failed jobs up to five total attempts using the shared backoff helper, and drains in-flight work for up to 30 seconds on shutdown /SIGTERM. - (2026-05-05) F038 complete.
initializeApp()now boots the webhook delivery queue withgetRedisClientplus a realprocessWebhookDeliveryJob()callback, and the existing SIGTERM/SIGINT cleanup path now shuts the queue down alongside the email retry queues. - (2026-05-05) F040 complete. Added
server/src/lib/webhooks/autoDisable.tsand wired it intoprocessWebhookDeliveryJob(). Failed deliveries now advance the webhook's rolling stats, and once the first failure since the last success has aged past 24 hours the webhook is auto-disabled exactly once and the owning user receives a direct notification email via the system email service. - (2026-05-05) F052 complete. Updated the base webhook migration plus
server/src/lib/webhooks/webhookModel.tsso webhook rows now persist and returnevent_filterJSON. That closes the storage gap underevent_filter.entity_idsbefore the subscriber starts enforcing it. - (2026-05-05) F041 complete.
webhookSubscriber.tsnow enforcesevent_filter.entity_idsbefore enqueueing jobs: when a webhook row carries a non-empty allowlist, only matching ticket IDs are queued. Missing/empty allowlists still receive all matching event types. - (2026-05-05) F042 complete.
ApiWebhookController.rotateSecret()now performs a real secret rotation: it generates a 32-byte base64url secret, updates the webhook throughwebhookModel.update(..., { signingSecret }), and returns the plaintext once in the response instead of the old timestamp stub. - (2026-05-05) F043 complete.
ApiWebhookController.verifySignature()now resolves the signing secret from eitherwebhook_idorsecret_vault_path, normalizes split signature inputs into thet=...,v1=...header format when needed, and returns the real HMAC match result instead of the old always-true stub. - (2026-05-05) F044 complete. Replaced four controller TODOs:
getDelivery()now loads a concretewebhook_deliveriesrow viawebhookModel.getDeliveryById(),getHealth()derives a stable health summary from the webhook stats columns,getSubscriptions()returns the storedevent_typesfor the webhook, andlistEvents()returns the public enum fromwebhookEventTypeSchema. - (2026-05-05) F045 complete. Deleted the deferred webhook route handlers
for transform/filter validation, system health, global/nested subscription
creation, bulk/search/export, and manual event triggering. The nested
[id]/subscriptionsroute now exposes onlyGET, and the removed paths will 404 instead of surfacing TODO-backed handlers. - (2026-05-05) F046 complete.
ApiWebhookController.testById()now sends a real signedwebhook.testrequest to the configured webhook URL, records the attempt inwebhook_deliverieswithis_test=true, and returns the observed transport result. It reuses the live signing/header and SSRF-guard path but skips the outbound rate-limit bucket and does not mutate webhook delivery stats. - (2026-05-05) F048 complete. Added
server/src/services/cleanupWebhookDeliveriesJob.tsplus scheduler wiring inserver/src/lib/jobs/index.tsandserver/src/lib/jobs/initializeScheduledJobs.ts. The new system-wide job runs every 15 minutes and deleteswebhook_deliveriesrows older than 30 days in batches of 10,000 until the backlog is gone. - (2026-05-05) F049 complete.
enforceApiRateLimit()now emits structured fallback metric logs forapi_rate_limit_consumed_total,api_rate_limit_remaining, andapi_rate_limit_redis_unavailable_total, using stable label fields (tenant,api_key_id,outcome) alongside the existing throttle WARN. - (2026-05-05) F050 complete. Added
server/src/lib/webhooks/metrics.tsand wired structured fallback metric logs intoWebhookDeliveryQueue,processWebhookDeliveryJob(), andmaybeAutoDisable(). That now emitswebhook_queue_depth,webhook_deliveries_total,webhook_delivery_duration_ms, andwebhook_auto_disabled_total. - (2026-05-05) F051 complete. Added
docs/api/api-rate-limiting-and-ticket-webhooks.mdwith the public rate-limit contract, webhook event examples, HMAC verification recipes, idempotency/ordering notes, and retry schedule; linked it fromdocs/api/api_overview.md. - (2026-05-06) F047 complete. Added tenant-authenticated webhook admin
actions in
packages/auth/src/actions/webhookActions.ts, a newAdminWebhooksSetupsettings component with create/edit, test-send, secret rotation, pause/resume, delete, delivery history, and manual retry enqueue, plus Security settings tab wiring inserver/src/components/settings/security/SecuritySettingsPage.tsx. The DAL now also exposes tenant-scoped webhook listing and paginated delivery history helpers for the UI. - (2026-05-06) T007 complete. Added
server/src/test/integration/apiRateLimit.headers.test.ts, which drives the realApiBaseController.list()auth path 121 times under one tenant/API key and asserts the 121st response is a 429 withRetry-After,X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset, and the expectedRATE_LIMITEDerror envelope details. - (2026-05-06) T008 complete. Extended
server/src/test/integration/apiRateLimit.headers.test.tswith the success case assertion: an allowed authenticated request now provesX-RateLimit-Limit=120andX-RateLimit-Remaining=119are attached on the 200 response from the same controller path. - (2026-05-06) T009 complete. Extended the same
apiRateLimit.headers.test.tsharness to swap API key identities within one tenant and prove bucket isolation: with a 5-token config, key A throttles on request 6 while key B still gets a 200 and its ownremaining=4header. - (2026-05-06) T010 complete. The same harness now also forces the
tenant-scoped API-key auth branch via
x-tenant-idand proves the bucket key includes tenant: exhausting tenant A with a sharedapi_key_idno longer affects tenant B, which still succeeds with its ownremaining=4header. - (2026-05-06) T011 complete. Extended the rate-limit integration harness
to drive an exhausted bucket and then swap only the request pathname to a
bypassed route (
/api/v1/meta/health). Those calls stay 200 and the next ticket-path request is still 429, proving bypasses do not consume tokens. - (2026-05-06) T012 complete. Added observation-mode coverage to the same
rate-limit integration file by mocking the shared logger: with
RATE_LIMIT_ENFORCE=false, the 121st request now stays 200 withremaining=0, and the test asserts the structured throttle WARN still carriestenant,api_key_id, andretry_after_ms. - (2026-05-06) T013 complete. Added a broken-Redis branch to the same
harness: 200 authenticated requests now stay 200 with
X-RateLimit-Remaining=-1, and the mocked logger proves theapi_rate_limit_redis_unavailable_totalmetric payload is emitted on the fail-open path. - (2026-05-06) T014 complete. The rate-limit harness now also imports the
shared middleware wrappers and proves bucket sharing across all three auth
surfaces: after five mixed requests through
ApiBaseController,withApiKeyAuth, andwithAuth, the next request on each surface returns 429 from the same(tenant, api_key_id)bucket. - (2026-05-06) T015 complete. Added NM Store coverage to the same
integration file by mocking
getAppSecret('nm_store_api_key'): thewithApiKeyAuth({ allowNmStore: true })branch now throttles the shared sentinel bucket after five requests, while a normal API key in the same tenant still succeeds with its ownremaining=4header.