Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

155 lines
19 KiB
Markdown

# Scratchpad — Unified Inbound Email Queue with Pointer Jobs
- Plan slug: `unified-inbound-email-pointer-queue`
- Created: `2026-03-01`
## What This Is
Working notes for moving Microsoft, Google, and IMAP inbound email ingress to one pointer-based Redis queue with consume-time idempotency.
## Decisions
- (2026-03-01) Use one queue ingestion model for all inbound providers: Microsoft callback, Google callback, and IMAP listener enqueue pointer jobs only.
- (2026-03-01) Use consume-time idempotency instead of ingress-time idempotency.
- (2026-03-01) Queue payloads stay pointer-only (no raw MIME/attachment bytes).
- (2026-03-01) Source-content drift for IMAP between ingest and consume is accepted risk; unavailable source should produce deterministic skipped outcome.
- (2026-03-01) Ingress success must mean durable enqueue success.
- (2026-03-01) F001 implemented by defining `UnifiedInboundEmailQueueJob` as a discriminated union (`provider`) with provider-specific pointer objects (`microsoft`, `google`, `imap`), while keeping legacy `EmailQueueJob` for compatibility during migration.
- (2026-03-01) Added a dedicated unified queue feature flag gate (`UNIFIED_INBOUND_EMAIL_POINTER_QUEUE_*`) so provider webhooks can move to enqueue-only behavior without forcing immediate cutover.
## Discoveries / Constraints
- IMAP service already retries webhook dispatch on non-2xx responses.
- Existing IMAP in-app async queue implementation is in-memory and returns success after enqueue, which is not durable acceptance.
- Microsoft and Google callback handlers currently fetch and process in callback path; this plan changes them to enqueue-only ingress.
- Inbound email interface definitions are duplicated across `shared/interfaces`, `server/src/interfaces`, and `packages/types/src/interfaces`; all three must be kept in sync for type consumers.
- Microsoft webhook handler is transaction-scoped per notification; queue-mode enqueue can be inserted before legacy fetch/process logic and short-circuit the callback path cleanly.
- Google webhook flow can enqueue immediately after provider resolution + JWT verification, before any `gmail_processed_history` writes or Gmail API fetches.
- IMAP listener now has enough metadata at fetch time (`mailbox`, `uid`, `uidValidity`, `messageId`) to emit pointer-only webhook payloads; no raw body is required for unified queue ingress.
- Unified queue internals now track ready/processing/inflight/DLQ keys with lease metadata, enabling explicit claim and completion lifecycle management.
- Queue enqueue now enforces a runtime pointer-only payload guard that rejects forbidden MIME/body/attachment keys at both top-level and nested pointer metadata.
- Legacy IMAP in-memory async queue now rejects enqueue attempts when unified pointer queue mode is enabled for the same tenant/provider, preventing accidental production regressions to in-memory processing.
- Security checks are still enforced before enqueue-only handoff: Microsoft validation/clientState checks, Google Pub/Sub JWT verification, and IMAP secret header verification all execute before unified-queue enqueue paths.
- IMAP async-mode gating is now provider-aware and supports explicit legacy-path disablement via `IMAP_INBOUND_EMAIL_IN_APP_ASYNC_DISABLED`, while also auto-disabling async mode whenever unified pointer queue mode is enabled for a provider.
- Unified queue now emits structured event logs for `enqueue`, `consume_start`, `ack`, `retry`, `dlq`, `reclaim`, and consumer `skip` with job/pointer identifiers and attempt metadata.
- Microsoft webhook response contract now reports handoff mode (`unified_pointer_queue`/`mixed`/`inline_processing`) plus queue vs inline counts, aligning callback semantics with Google/IMAP queue-mode responses.
- Queue consumer provider routing is implemented in `processUnifiedInboundEmailQueueJob` via provider-specific pointer resolution paths: Microsoft (`messageId`), Google (`historyId` plus discovered message IDs), and IMAP (`uid` mailbox fetch).
## Commands / Runbooks
- `python3 /Users/roberisaacs/.codex/skills/alga-plan/scripts/scaffold_plan.py "Unified Inbound Email Queue with Pointer Jobs" --slug unified-inbound-email-pointer-queue`
- `python3 /Users/roberisaacs/.codex/skills/alga-plan/scripts/validate_plan.py ee/docs/plans/2026-03-01-unified-inbound-email-pointer-queue`
- `npm -w shared run typecheck`
- `npm -w @alga-psa/types run build`
- `npm -w server run typecheck`
- `npm -w shared run typecheck` (after Microsoft queue-mode changes)
- `npm -w server run typecheck` (after Microsoft queue-mode changes)
- `npm -w email-service run build`
- `npm -w @alga-psa/integrations run typecheck`
- `npm -w server run test -- src/test/integration/microsoftWebhookUnifiedQueue.integration.test.ts`
- `npm -w server run test -- src/test/integration/googleWebhookUnifiedQueue.integration.test.ts --coverage.enabled=false`
- `npm -w server run test -- src/test/integration/imapWebhookHandoff.integration.test.ts --coverage.enabled=false`
- `npm -w server run test -- src/test/integration/microsoftWebhookUnifiedQueue.integration.test.ts src/test/integration/googleWebhookUnifiedQueue.integration.test.ts src/test/integration/imapWebhookHandoff.integration.test.ts --coverage.enabled=false`
- `npx vitest --config shared/vitest.config.ts services/email-service/src/emailService.webhookRetry.test.ts`
- `npx vitest --config shared/vitest.config.ts shared/services/email/__tests__/unifiedInboundEmailQueueConsumer.test.ts`
- `npm -w server run test -- src/test/unit/unifiedInboundEmailQueueJobProcessor.fetch.test.ts --coverage.enabled=false`
- `npx vitest --config shared/vitest.config.ts shared/services/email/__tests__/unifiedInboundEmailQueue.test.ts`
## Links / References
- IMAP webhook route: `packages/integrations/src/webhooks/email/imap.ts`
- IMAP in-memory queue: `packages/integrations/src/webhooks/email/imapInAppQueue.ts`
- Microsoft webhook route: `packages/integrations/src/webhooks/email/microsoft.ts`
- Google webhook route: `packages/integrations/src/webhooks/email/google.ts`
- IMAP listener dispatch path: `services/email-service/src/emailService.ts`
- Existing related plan: `ee/docs/plans/2026-02-27-inbound-email-inapp-artifact-persistence-remaining-work/`
- Unified job contract files:
- `shared/interfaces/inbound-email.interfaces.ts`
- `server/src/interfaces/email.interfaces.ts`
- `packages/types/src/interfaces/email.interfaces.ts`
- Unified queue helper: `shared/services/email/unifiedInboundEmailQueue.ts`
- Unified queue flag gate helper: `shared/services/email/inboundEmailInAppFeatureFlag.ts`
- Unified queue consumer loop: `shared/services/email/unifiedInboundEmailQueueConsumer.ts`
- Server queue job processor: `server/src/services/email/unifiedInboundEmailQueueJobProcessor.ts`
- Server consumer entrypoint: `server/src/bin/unifiedInboundEmailQueueConsumer.ts`
- Unified queue runbook: `ee/docs/plans/2026-03-01-unified-inbound-email-pointer-queue/RUNBOOK.md`
- Microsoft unified ingress contract tests: `server/src/test/integration/microsoftWebhookUnifiedQueue.integration.test.ts`
- Google unified ingress contract tests: `server/src/test/integration/googleWebhookUnifiedQueue.integration.test.ts`
- IMAP webhook retry test: `services/email-service/src/emailService.webhookRetry.test.ts`
- Unified queue consumer tests: `shared/services/email/__tests__/unifiedInboundEmailQueueConsumer.test.ts`
- Unified queue job processor fetch tests: `server/src/test/unit/unifiedInboundEmailQueueJobProcessor.fetch.test.ts`
- Unified queue primitives tests: `shared/services/email/__tests__/unifiedInboundEmailQueue.test.ts`
## Progress Log
- (2026-03-01) Completed `F001`: Added unified pointer job contract types with provider-specific pointer metadata and queue lifecycle fields (`attempt`, `maxAttempts`, `enqueuedAt`, `jobId`, `schemaVersion`).
- (2026-03-01) Completed `F002`: Microsoft webhook now supports enqueue-only pointer handoff in unified-queue mode, using `shared/services/email/unifiedInboundEmailQueue.ts` and no longer requiring inline full-email fetch/processing when that mode is enabled.
- (2026-03-01) Completed `F003`: Google webhook now supports enqueue-only pointer handoff in unified-queue mode (`historyId`, `emailAddress`, `pubsubMessageId`) and returns `503` when durable enqueue fails.
- (2026-03-01) Completed `F004`: IMAP listener/webhook handoff now supports pointer-only ingress (`mailbox`, `uid`, `uidValidity`, optional `messageId`) and enqueues IMAP pointer jobs when unified queue mode is enabled.
- (2026-03-01) Completed `F005`: Unified pointer ingress is now persisted in Redis list storage via `shared/services/email/unifiedInboundEmailQueue.ts` (`RPUSH` on a configurable queue key).
- (2026-03-01) Completed `F006`: Unified queue mode ingress responses now acknowledge only after enqueue returns success; enqueue errors return non-success responses so callers can retry.
- (2026-03-01) Completed `F007`: Microsoft, Google, and IMAP unified-queue paths now return `503` when enqueue fails, preserving upstream retry behavior.
- (2026-03-01) Completed `F008`: Added a reusable consumer loop (`UnifiedInboundEmailQueueConsumer`) plus queue claim/ack/fail/reclaim primitives for processing unified inbound pointer jobs.
- (2026-03-01) Completed `F009`: Added provider-specific consume-time pointer resolution in `unifiedInboundEmailQueueJobProcessor` for Microsoft (`messageId`), Google (`historyId` -> message IDs), and IMAP (`uid` fetch) before downstream processing.
- (2026-03-01) Completed `F010`: Added consume-time idempotency insert/check against `email_processed_messages` with duplicate short-circuit when a normalized external identity already exists.
- (2026-03-01) Completed `F011`: Queue job processor now calls `processInboundEmailInApp` for fetched provider messages and records final processing status back to `email_processed_messages`.
- (2026-03-01) Completed `F012`: Consumer loop now ACKs only after `handleJob` completes successfully; failed jobs are not ACKed and are routed through retry/DLQ handling.
- (2026-03-01) Completed `F013`: Added lease-based reclaim (`reclaimExpiredUnifiedInboundEmailQueueJobs`) so stale in-flight jobs are resurfaced back to the ready queue.
- (2026-03-01) Completed `F014`: Failed jobs now increment `attempt` in queue payload state and only requeue while below configured `maxAttempts`.
- (2026-03-01) Completed `F015`: Once `attempt` reaches `maxAttempts`, failed jobs are routed to the dedicated unified inbound pointer DLQ key.
- (2026-03-01) Completed `F016`: Source-unavailable fetch failures now resolve as deterministic `skipped` outcomes (`source_unavailable:*`) recorded in `email_processed_messages` and do not rethrow for retry.
- (2026-03-01) Completed `F017`: Consumer idempotency now uses a normalized external identity format (`<provider>:<messageId>`) prior to persistence checks.
- (2026-03-01) Completed `F018`: Added `assertPointerOnlyPayload` validation in enqueue to reject raw content-like keys (`rawMime`, `attachments`, `body`, etc.) and enforce pointer-only queue contracts at runtime.
- (2026-03-01) Completed `F019`: Added a defensive runtime guard in `imapInAppQueue` that throws when unified pointer queue mode is enabled for the tenant/provider, ensuring legacy in-memory queue path is bypassed/retired for production unified-mode processing.
- (2026-03-01) Completed `F020`: Verified webhook auth/verification behavior is preserved in enqueue-only mode across Microsoft, Google, and IMAP paths (no auth bypass introduced by unified queue branching).
- (2026-03-01) Completed `F021`: Aligned queue migration flags by extending IMAP async mode evaluation to accept provider context, auto-disable on unified mode, and honor `IMAP_INBOUND_EMAIL_IN_APP_ASYNC_DISABLED` for explicit legacy disablement.
- (2026-03-01) Completed `F022`: Added structured observability events across queue lifecycle and consumer skip outcomes, including tenant/provider/pointer identifiers, attempts, and terminal reasons for retry/DLQ paths.
- (2026-03-01) Completed `F023`: Updated provider callback contracts so unified mode explicitly reports queue handoff metadata and avoids inline-processing ambiguity in webhook responses.
- (2026-03-01) Completed `F024`: Confirmed unified consumer routing dispatches per provider type and fetches provider-specific source payloads before shared in-app processing.
- (2026-03-01) Completed `F025`: Added a dedicated runbook covering architecture, queue keys, feature flags, consumer startup, and local validation/failure-path checks.
- (2026-03-01) Completed `T001`: Added Microsoft unified ingress contract test validating pointer-only enqueue payload shape (`tenantId`, `providerId`, provider pointer identifiers) and absence of raw content fields.
- (2026-03-01) Completed `T002`: Added Google unified ingress contract test validating pointer-only enqueue payload shape (`tenantId`, `providerId`, `historyId`, `pubsubMessageId`) behind successful JWT/provider verification.
- (2026-03-01) Completed `T003`: Extended IMAP webhook integration coverage with unified-mode pointer enqueue assertions (`mailbox`, `uid`, `uidValidity`, `messageId`) and pointer-only payload guards.
- (2026-03-01) Completed `T004`: Added deferred-enqueue Microsoft webhook test proving `200` success is not returned until unified queue enqueue promise resolves.
- (2026-03-01) Completed `T005`: Added deferred-enqueue Google webhook test proving callback success response is blocked until unified queue enqueue completion.
- (2026-03-01) Completed `T006`: Added deferred-enqueue IMAP webhook test proving unified-mode success response is blocked until pointer job enqueue completion.
- (2026-03-01) Completed `T007`: Added enqueue-failure assertions for Microsoft, Google, and IMAP unified ingress paths, each returning `503` to preserve upstream retry semantics.
- (2026-03-01) Completed `T008`: Extracted and tested IMAP webhook retry helper to verify non-2xx ingress responses trigger retry attempts before eventual success.
- (2026-03-01) Completed `T009`: Added consumer unit coverage confirming Microsoft pointer claims invoke handler and ACK path through unified consumer loop.
- (2026-03-01) Completed `T010`: Validated Google pointer claims execute through the same unified consumer claim/handle/ACK lifecycle.
- (2026-03-01) Completed `T011`: Validated IMAP pointer claims execute through the same unified consumer claim/handle/ACK lifecycle.
- (2026-03-01) Completed `T012`: Added processor fetch test proving Microsoft pointer jobs resolve full provider payloads before shared in-app processing execution.
- (2026-03-01) Completed `T013`: Added processor fetch test proving Google pointer jobs resolve message payloads (history cursor -> message IDs -> full payloads) before processing.
- (2026-03-01) Completed `T014`: Added processor fetch test proving IMAP pointer jobs resolve mailbox UID content into normalized email payloads before processing.
- (2026-03-01) Completed `T015`: Added idempotency happy-path test validating first consume writes normalized identity (`provider:messageId`) processing marker and executes downstream processing.
- (2026-03-01) Completed `T016`: Added idempotency duplicate-path test validating unique-constraint collision (`23505`) short-circuits downstream processing with deduped skip outcome.
- (2026-03-01) Completed `T017`: Processor fetch suite now asserts `processInboundEmailInApp` receives fully resolved provider payloads on successful consume-time fetch paths.
- (2026-03-01) Completed `T018`: Added queue ACK primitive test validating successful consume removes payload from processing list and clears inflight hash/lease entries.
- (2026-03-01) Completed `T019`: Added consumer failure-path test validating processing exceptions skip ACK and invoke retry-handling path (`failUnifiedInboundEmailQueueJob`).
- (2026-03-01) Completed `T020`: Added reclaim-path queue test validating expired inflight claims are removed from processing structures and requeued to ready state.
- (2026-03-01) Completed `T021`: Added retry-path queue test validating failed consume increments job attempt prior to requeue.
- (2026-03-01) Completed `T022`: Added DLQ-path queue test validating jobs are moved to dead-letter storage once max attempts are reached.
- (2026-03-01) Completed `T023`: Added IMAP source-unavailable processor test validating deterministic `source_unavailable:*` skip reason and consume-marker persistence.
- (2026-03-01) Completed `T024`: Added consumer skipped-outcome test validating source-unavailable paths ACK and avoid retry-loop behavior.
- (2026-03-01) Completed `T025`: Added queue payload-guard test validating enqueue rejects raw-content fields and enforces pointer-only contract.
- (2026-03-01) Completed `T026`: Added IMAP regression test validating unified queue mode bypasses legacy in-memory async queue path even when legacy async flag is enabled.
- (2026-03-01) Completed `T027`: Added Microsoft security regression test validating `clientState` mismatch blocks enqueue in unified mode.
- (2026-03-01) Completed `T028`: Added Google security regression test validating JWT auth header remains required in enqueue-only mode.
- (2026-03-01) Completed `T029`: Added IMAP security regression test validating webhook secret mismatch still rejects enqueue-only requests.
- (2026-03-01) Completed `T030`: Existing unified-mode ingress contract tests confirm flag enablement routes Microsoft/Google/IMAP into enqueue-only handoff paths.
- (2026-03-01) Completed `T031`: Added rollback-path IMAP test validating unified flag disablement preserves legacy `in_app_async` handoff behavior.
- (2026-03-01) Completed `T032`: Queue logging tests now assert enqueue success/failure events include provider, tenant, and pointer identifiers.
- (2026-03-01) Completed `T033`: Queue + consumer logging tests now assert retry/DLQ/skip event payloads include attempt counts and terminal reasons.
- (2026-03-01) Completed `T034`: Idempotency persistence test coverage confirms first consume writes `email_processed_messages` marker for newly processed identities.
- (2026-03-01) Completed `T035`: Duplicate-guard test coverage confirms unique-constraint collision blocks second consume and prevents downstream processing.
- (2026-03-01) Completed `T036`: Combined Microsoft webhook enqueue contract + processor consume-time fetch tests validate callback-to-worker shared processing flow and created-outcome handoff.
- (2026-03-01) Completed `T037`: Combined Google webhook enqueue contract + processor consume-time fetch tests validate callback-to-worker shared processing flow and created-outcome handoff.
- (2026-03-01) Completed `T038`: Combined IMAP listener/webhook enqueue contract + processor consume-time fetch tests validate callback-to-worker shared processing flow and created-outcome handoff.
- (2026-03-01) Completed `T039`: Idempotency duplicate-consume coverage validates repeated provider deliveries result in a single processed outcome with deduped no-op on repeats.
- (2026-03-01) Completed `T040`: Plan docs validation complete (`SCRATCHPAD.md` + `RUNBOOK.md`) with unified architecture, flags, queue lifecycle, and local verification runbook steps.
## Open Questions
- Choose Redis queue primitive for implementation phase: Streams with consumer groups vs list-based queue with explicit inflight tracking.
- Decide whether DLQ re-drive tooling is required in this scope or deferred.