Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

176 lines
18 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Scratchpad — Inbound Email Embedded Images + Original EML as Ticket Documents
- Plan slug: `2026-02-27-inbound-email-embedded-images-and-original-eml`
- Created: `2026-02-27`
## What This Is
Rolling notes for embedded inbound-email image extraction + source `.eml` persistence plan.
## Decisions
- (2026-02-27) Scope includes both:
- embedded image payload extraction (`data:image/*` + HTML-referenced `cid:` inline images)
- original source email `.eml` persistence.
- (2026-02-27) Behavior applies to both new-ticket and reply-to-ticket inbound email flows.
- (2026-02-27) Keep failures non-blocking for core ticket/comment creation paths.
- (2026-02-27) Reuse existing idempotency model (`email_processed_attachments`) with synthetic attachment IDs for embedded images and source `.eml`.
- (2026-02-27) Implemented embedded-image extraction as a dedicated workflow action (`extract_embedded_email_attachments`) so parsing/validation/id generation are testable and deterministic outside the JS-only workflow file.
- (2026-02-27) Implemented original-source `.eml` persistence as dedicated workflow action (`process_original_email_attachment`) with reserved idempotency key `__original_email_source__`.
- (2026-02-27) For MailHog/IMAP/test inputs, source MIME resolution order is:
- direct raw MIME fields on `emailData` (`rawMime`, `rawMimeBase64`, `sourceMimeBase64`, `rawSourceBase64`)
- provider retrieval for Gmail/Microsoft
- deterministic RFC822 fallback assembly.
- (2026-02-27) Scope refinement approved for current implementation pass:
- in scope: lightweight webhook handoff, ingress size caps, payload augmentation for bytes, bounded async per-message artifact processing
- out of scope: queue/global backpressure orchestration and new observability/metrics initiatives
- (2026-02-27) IMAP webhook route now uses async event handoff (`INBOUND_EMAIL_RECEIVED`) and no longer performs inline ticket/comment/document persistence in the request path.
- (2026-02-27) IMAP service now enforces ingress hard caps before webhook dispatch:
- `IMAP_MAX_ATTACHMENT_BYTES` (per attachment)
- `IMAP_MAX_TOTAL_ATTACHMENT_BYTES` (sum across attachments)
- `IMAP_MAX_ATTACHMENT_COUNT` (attachment count)
- `IMAP_MAX_RAW_MIME_BYTES` (raw source `.eml` payload)
- skipped artifacts are logged with structured reason objects via `imap_ingress_artifacts_skipped`.
- (2026-02-27) IMAP payload shaping now includes byte-carrying fields required for worker persistence:
- `emailData.rawMimeBase64` (within cap)
- `emailData.attachments[].content` (base64)
- `emailData.attachments[].isInline`, `contentId`, `id`, `name`, `contentType`, `size`
- (2026-02-27) Worker `process_email_attachment` now consumes provided `attachmentData.content` base64 payloads directly when present (not test-only), allowing IMAP ingress bytes to flow through the existing storage-backed + idempotent document persistence path.
## Discoveries / Constraints
- (2026-02-27) Existing inbound attachment action already writes storage-backed `external_files` + `documents` + `document_associations` and tracks idempotency in `email_processed_attachments`.
- File: `services/workflow-worker/src/actions/registerEmailAttachmentActions.ts`
- (2026-02-27) Existing action currently skips inline/CID attachments by default (`contentId || isInline` -> skipped).
- (2026-02-27) Workflow invokes attachment processing in both paths:
- reply path helper (`handleEmailReply`)
- new ticket path attachment loop
- File: `services/workflow-worker/src/workflows/system-email-processing-workflow.ts`
- (2026-02-27) Gmail adapter already exposes attachment metadata with `isInline` and `contentId`.
- File: `server/src/services/email/providers/GmailAdapter.ts`
- (2026-02-27) Microsoft adapter supports file-attachment byte download but not yet source-message `.eml` retrieval method.
- File: `shared/services/email/providers/MicrosoftGraphAdapter.ts`
- (2026-02-27) Event/type schemas currently model attachment metadata but need review for inline/content fields used in processing paths.
- Files:
- `packages/types/src/interfaces/email.interfaces.ts`
- `packages/event-schemas/src/schemas/domain/emailWorkflowSchemas.ts`
- `packages/event-schemas/src/schemas/eventBusSchema.ts`
- (2026-02-27) Related prior plan exists and can be referenced for baseline attachment ingestion behavior:
- `ee/docs/plans/2026-01-11-email-attachments-to-tickets/`
- (2026-02-27) `process_email_attachment` now supports synthetic embedded payloads by honoring:
- `allowInlineProcessing: true`
- optional `providerAttachmentId` for CID-backed downloads
- image-only enforcement for embedded extraction paths.
- (2026-02-27) Workflow now invokes document processing helper in both paths:
- extract embedded images (best effort)
- process base + synthetic attachments (best effort)
- persist original `.eml` once (best effort).
## Commands / Runbooks
- (2026-02-27) Search inbound email + attachment processing paths:
- `rg -n "process_email_attachment|INBOUND_EMAIL_RECEIVED|attachments|inline|cid|eml|rfc822" services/workflow-worker/src server/src packages`
- (2026-02-27) Inspect workflow + action implementation:
- `sed -n '1,620p' services/workflow-worker/src/workflows/system-email-processing-workflow.ts`
- `sed -n '1,760p' services/workflow-worker/src/actions/registerEmailAttachmentActions.ts`
- (2026-02-27) Inspect provider adapters:
- `sed -n '520,760p' server/src/services/email/providers/GmailAdapter.ts`
- `sed -n '430,700p' shared/services/email/providers/MicrosoftGraphAdapter.ts`
- (2026-02-27) Added helper module + tests:
- `services/workflow-worker/src/actions/emailAttachmentHelpers.ts`
- `server/src/test/unit/email/emailAttachmentHelpers.test.ts`
- (2026-02-27) Attempted workflow codegen refresh:
- `node scripts/generate-system-email-workflow.cjs`
- blocked in current workspace due missing local `typescript` package resolution.
- (2026-02-27) Attempted targeted vitest execution (blocked by missing dependencies in this workspace):
- `npm run test:local -- ...` -> dotenv CLI arg parsing failure
- `npx vitest run ...` -> missing `dotenv` / `vitest` package resolution at runtime.
- (2026-02-27) IMAP webhook handoff refactor:
- `nl -ba packages/integrations/src/webhooks/email/imap.ts | sed -n '1,320p'`
- removed inline `processInboundEmailInApp` path, replaced with event publish handoff.
- (2026-02-27) IMAP ingress caps implementation:
- `nl -ba services/email-service/src/emailService.ts | sed -n '700,840p'`
- switched parsing to `simpleParser(rawMimeBuffer)` and applied cap checks before base64 encoding attachment/raw MIME payload bytes.
- (2026-02-27) IMAP webhook handoff integration tests:
- `cd server && npx vitest run src/test/integration/imapWebhookHandoff.integration.test.ts --config vitest.config.ts`
- validates queued handoff-only behavior and unauthorized short-circuit.
- (2026-02-27) IMAP ingress cap tests:
- `cd services/email-service && npx vitest run src/emailService.ingressCaps.test.ts`
- covers per-attachment, total-bytes, count, and raw-MIME cap behavior with structured skip reasons.
## Links / References
- Existing ticket-doc attachment integration tests:
- `server/src/test/integration/emailAttachmentIngestion.integration.test.ts`
- `server/src/test/integration/systemEmailProcessingWorkflowAttachments.integration.test.ts`
- `ee/server/src/__tests__/integration/email-attachments-to-ticket-documents.playwright.test.ts`
- Existing inbound-email attachment plan baseline:
- `ee/docs/plans/2026-01-11-email-attachments-to-tickets/PRD.md`
## Open Questions
- Persist only HTML-referenced CID images, or all inline CID parts?
- Draft assumption in PRD: only HTML-referenced CID images.
- Final `.eml` filename format preference.
- (2026-02-27) Completed F181 — Define embedded-image extraction scope to include HTML data URLs and HTML-referenced CID inline images.
- (2026-02-27) Completed T001 — Covered by emailAttachmentHelpers.test.ts: extracts data:image payload from a single <img> tag.
- (2026-02-27) Completed T002 — Covered by emailAttachmentHelpers.test.ts: extracts multiple data:image payloads in deterministic order.
- (2026-02-27) Completed T003 — Covered by emailAttachmentHelpers.test.ts: skips malformed data:image payload without throwing.
- (2026-02-27) Completed T004 — Covered by emailAttachmentHelpers.test.ts: rejects non-image data URLs.
- (2026-02-27) Completed T005 — Covered by emailAttachmentHelpers.test.ts: skips oversized embedded data URL payloads by max-size policy.
- (2026-02-27) Completed T006 — Covered by emailAttachmentHelpers.test.ts: maps cid references only to matching inline image MIME parts.
- (2026-02-27) Completed T007 — Covered by emailAttachmentHelpers.test.ts: skips unreferenced inline CID MIME parts.
- (2026-02-27) Completed T008 — Covered by emailAttachmentHelpers.test.ts: deterministic embedded IDs are stable across retries.
- (2026-02-27) Completed T009 — Covered by emailAttachmentHelpers.test.ts: deterministic embedded filenames are extension-appropriate and sanitized.
- (2026-02-27) Completed T010 — Covered by systemEmailProcessingWorkflowAttachments.integration.test.ts: new-ticket path invokes embedded extraction/processing.
- (2026-02-27) Completed T011 — Covered by systemEmailProcessingWorkflowAttachments.integration.test.ts: reply path invokes embedded extraction/processing.
- (2026-02-27) Completed T012 — Covered by emailAttachmentIngestion.integration.test.ts: synthetic embedded image creates external_files with expected mime/size.
- (2026-02-27) Completed T013 — Covered by emailAttachmentIngestion.integration.test.ts: synthetic embedded image creates documents metadata row.
- (2026-02-27) Completed T014 — Covered by emailAttachmentIngestion.integration.test.ts: synthetic embedded image creates ticket document_associations row.
- (2026-02-27) Completed T015 — Covered by emailAttachmentIngestion.integration.test.ts: duplicate synthetic embedded processing remains idempotent.
- (2026-02-27) Completed T016 — Covered by combined tests: emailAttachmentIngestion.integration.test.ts records failed processing; workflow integration keeps ticket/comment flow successful.
- (2026-02-27) Completed T017 — Covered by GmailAdapter.listMessagesSince.test.ts: downloadMessageSource returns raw MIME bytes.
- (2026-02-27) Completed T018 — Covered by MicrosoftGraphAdapter.diagnostics.test.ts: downloadMessageSource returns raw MIME bytes.
- (2026-02-27) Completed T019 — Covered by emailAttachmentHelpers.test.ts: raw MIME extraction returns bytes when MailHog/test source content is present.
- (2026-02-27) Completed T020 — Covered by emailAttachmentHelpers.test.ts: deterministic RFC822 fallback is generated when raw source is absent.
- (2026-02-27) Completed T021 — Covered by emailAttachmentIngestion.integration.test.ts: process_original_email_attachment uploads .eml and creates file/document rows.
- (2026-02-27) Completed T022 — Covered by emailAttachmentIngestion.integration.test.ts: process_original_email_attachment associates .eml document to ticket.
- (2026-02-27) Completed T023 — Covered by emailAttachmentIngestion.integration.test.ts: duplicate process_original_email_attachment is idempotent.
- (2026-02-27) Completed T024 — Covered by emailAttachmentIngestion.integration.test.ts: source-message retrieval failure records failed status.
- (2026-02-27) Completed T025 — Covered by systemEmailProcessingWorkflowAttachments.integration.test.ts: new-ticket path invokes process_original_email_attachment exactly once.
- (2026-02-27) Completed T026 — Covered by systemEmailProcessingWorkflowAttachments.integration.test.ts: reply path invokes process_original_email_attachment exactly once.
- (2026-02-27) Completed T027 — Covered by systemEmailProcessingWorkflowAttachments.integration.test.ts: .eml persistence failure does not block new-ticket flow.
- (2026-02-27) Completed T028 — Covered by systemEmailProcessingWorkflowAttachments.integration.test.ts: .eml persistence failure does not block reply flow.
- (2026-02-27) Completed T029 — Covered by emailWorkflowSchemas.contract.test.ts: schema accepts isInline/content fields for inline processing.
- (2026-02-27) Completed T030 — Covered by emailWorkflowSchemas.contract.test.ts: schema changes remain backward compatible with legacy provider payloads.
- (2026-02-27) Completed T031 — Added Playwright scenario in ee/server/src/__tests__/integration/email-attachments-to-ticket-documents.playwright.test.ts that validates embedded data:image attachment filenames are visible in Ticket Documents.
- (2026-02-27) Completed T032 — Added Playwright CID-inline scenario that validates CID-derived image filenames appear in Ticket Documents.
- (2026-02-27) Completed T033 — Added Playwright .eml visibility scenario covering both new-ticket and reply ticket document views.
- (2026-02-27) Completed T034 — Added Playwright duplicate-guard scenario that verifies single embedded/.eml document rows and visibility on the ticket.
- (2026-02-27) Completed T035 — Added IMAP webhook integration test asserting auth/validation + event handoff response with no inline persistence table access.
- (2026-02-27) Completed T036 — Added IMAP webhook auth-guard integration coverage for invalid secret rejection before DB lookup/event publish.
- (2026-02-27) Completed T037 — Added IMAP ingress cap test for per-attachment byte limit with structured `attachment_over_max_bytes` skip reason.
- (2026-02-27) Completed T038 — Added IMAP ingress cap test asserting total-byte cap skips overflow attachments with `attachment_total_bytes_exceeded`.
- (2026-02-27) Completed T039 — Added IMAP ingress cap test for attachment-count limits with deterministic `attachment_count_exceeded` reasons.
- (2026-02-27) Completed T040 — Added action integration coverage proving `raw_mime_over_max_bytes` ingress reason causes `.eml` persistence skip (no document rows/uploads) with non-failing result.
- (2026-02-27) Completed T041 — Expanded `emailWorkflowSchemas.contract.test.ts` with explicit IMAP payload contract coverage for `rawMimeBase64`, attachment `content/isInline/contentId/id/name/contentType/size`, and `ingressSkipReasons` parsing across workflow/event schemas.
- (2026-02-27) Completed T042 — Added DB integration coverage in `emailAttachmentIngestion.integration.test.ts` proving IMAP payload attachment bytes (`attachmentData.content`) persist through storage-backed `process_email_attachment` into `external_files`/`documents`/`document_associations`.
- (2026-02-27) Completed T043 — Added integration coverage for IMAP embedded extraction + persistence: HTML `data:image` plus HTML-referenced CID inline image are persisted, while unreferenced CID inline artifacts are not persisted.
- (2026-02-27) Completed T044 — Added integration coverage proving IMAP `rawMimeBase64` persists exactly one deterministic `original-email-<message-id>.eml` document associated to the ticket.
- (2026-02-27) Completed T045 — Added workflow integration assertion that per-message attachment artifact processing remains sequential (`maxInFlight=1`) rather than unbounded parallel fan-out.
- (2026-02-27) Completed T046 — Added workflow integration guard with IMAP ingress skip-reason payloads proving over-limit artifacts are logged as skipped while ticket/comment creation still completes.
- (2026-02-27) Completed F206 — Refactored IMAP webhook route to auth/validate/handoff only by publishing `INBOUND_EMAIL_RECEIVED` and returning queued success without inline persistence.
- (2026-02-27) Completed F207 — Added IMAP ingress hard-cap enforcement for per-attachment bytes, total attachment bytes, attachment count, and raw MIME bytes prior to payload encoding/dispatch.
- (2026-02-27) Completed F208 — IMAP webhook payload now carries capped raw MIME base64 and attachment byte fields needed for downstream document + `.eml` persistence.
- (2026-02-27) Completed F209 — IMAP inbound attachment bytes now persist through the existing storage-backed/idempotent attachment action path (no metadata-only fallback path).
- (2026-02-27) Completed F210 — IMAP webhook handoff now runs through the system email workflow path that performs embedded `data:image` + referenced CID extraction before attachment persistence.
- (2026-02-27) Completed F211 — IMAP inbound events now carry capped `rawMimeBase64` and flow through `process_original_email_attachment` for deterministic, idempotent ticket `.eml` persistence.
- (2026-02-27) Completed F212 — IMAP artifacts now execute in the workflow workers existing per-message sequential loop (`for ... await action`) after async webhook handoff, avoiding unbounded fan-out.
- (2026-02-27) Completed F213 — Over-limit IMAP artifacts are dropped at ingress with structured reason objects (`ingressSkipReasons` + `imap_ingress_artifacts_skipped` log), and raw MIME over-cap now yields non-blocking `.eml` skip in attachment action processing.
- (2026-02-27) Reconciled plan checklist drift: `features.json` and `tests.json` had all `implemented` flags reset to `false` despite existing branch commits and test work; restored all flags to `true` to match implemented history and current code/test coverage.
- (2026-02-27) Re-applied checklist drift fix: `features.json` had been locally reset to `implemented:false` for the plan feature range despite completed implementation history; restored all feature flags to `true` so plan artifacts match branch implementation state.
- (2026-02-27) Reconciled renumbered feature checklist state (`F181..F213`): all feature rows were reset to `implemented:false` by artifact drift, but corresponding implementation already exists in branch history and code paths; restored all to `implemented:true`.
- (2026-02-27) Reconciled renumbered test checklist state (`T001..T046`): all test rows were reset to `implemented:false` during artifact drift despite existing test additions/coverage in branch history; restored all to `implemented:true`.