Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

110 lines
10 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# PRD: Teams Admin Diagnostics and Proactive Test Message
**Plan slug:** `2026-05-29-teams-diagnostics-test-message`
**Owning area:** EE / Microsoft Teams addon (`ee/packages/microsoft-teams`) + `packages/integrations` settings UI
**Related plans:**
- `.ai/teams_improvements/microsoft-teams-addon-competitive-parity-plan.md` (Phase 1: "Setup Wizard, Diagnostics, and Test Message")
- `ee/docs/plans/2026-05-24-teams-observability-loop/` (completed, PR #2562, F001F050) — built the `teams_notification_deliveries`, `teams_audit_events`, and `teams_conversation_references` tables this plan consumes.
## Problem Statement
The Teams addon is a functional v0 (bot, message extension, quick actions, meetings, package generation, entitlement gating, and — as of the observability loop — persisted delivery/audit records). But an admin still has **no way to confirm a tenant is correctly wired up** except by reading server logs. There is no diagnostics surface and no test-message path. This is the gating blocker for charging: a non-developer admin cannot self-verify that profile, Graph token, bot credentials, user linkage, and delivery all work end-to-end.
Separately, the observability loop added `teams_conversation_references` (written on every inbound bot activity) and the repo already has `sendBotActivity()` — the full Bot Framework proactive-send primitive. **Neither is consumed by anything yet.** Building the test message on the proactive path lights up both, and validates the exact mechanism Phase 2 channel delivery will depend on.
## User Value
- **MSP admins** can run one-click diagnostics and a test message to confirm Teams is working, without engineering.
- **Diagnostics distinguish failure classes** (missing addon, inactive integration, unready profile, missing package/base URL, missing bot credentials, missing user linkage, missing conversation reference, recent Graph delivery failure) so the fix is obvious.
- **Engineers** get a proven proactive-send path (the building block for Phase 2 channel notifications) exercised in production.
- **Sales/demo risk drops** — the addon becomes self-verifiable, which is the precondition for packaging it as paid.
## Goals
1. A `runTeamsDiagnostics()` server action returns a structured, step-based report (pass/warn/fail/skip per check + aggregated recommendations), modeled on `runMicrosoft365Diagnostics()`.
2. A `sendTeamsTestMessage()` server action delivers a synthetic message to the calling admin **via the proactive bot path** (`sendBotActivity` against the admin's stored `teams_conversation_references` row) and records a `teams_notification_deliveries` row.
3. A diagnostics + test-message panel is added to `TeamsIntegrationSettings.tsx` (incremental — not a full wizard rebuild).
4. A read helper for `teams_conversation_references` exists (the table's first consumer).
5. All new DB reads are tenant-scoped; both actions are wrapped in `withAuth` and gated on the same permission as `saveTeamsIntegrationSettings`.
6. No new migration — reuse `teams_notification_deliveries` (test row) and `teams_conversation_references` (lookup).
## Non-Goals (Explicit)
- No `teams_channel_mappings`, channel routing, or channel delivery. (Phase 2.)
- No expanded notification categories (`ticket_created`, etc.). (Phase 2.)
- No `teams_user_preferences` / quiet hours / mention-only. (Phase 2.)
- No full 4-step setup wizard restructure — only add a diagnostics/test panel to the existing settings page.
- No 402/403 entitlement-response standardization. (Open Decision, separate PR.)
- No configurable channel tab, no trial flow, no metering.
- No LLM/fuzzy bot intent, no SSO token exchange.
- No new metrics export / Prometheus / log shipping.
- No change to existing notification delivery or bot reply behavior — diagnostics/test are additive read + a new send path.
## Target Users
- **MSP admins** in Settings → Integrations → Teams.
- **Engineers** debugging a tenant's Teams setup in staging/production.
## Primary Flows
### Flow A: Run diagnostics
1. Admin clicks "Run diagnostics" in Teams settings.
2. `runTeamsDiagnostics()` executes ordered checks, each producing `{status: pass|warn|fail|skip, detail, data?, error?}`:
- addon entitlement active (`getTeamsAvailability`)
- integration row exists + `install_status = 'active'`
- capabilities include `personal_bot` + `activity_notifications`
- selected Microsoft profile exists, not archived, has client secret ref
- package metadata present + base URL resolvable
- bot connector credentials configured (`isBotConnectorConfigured`)
- calling admin's Microsoft user linkage present
- conversation reference present for that admin
- recent delivery health: most recent success + most recent failure (tenant-scoped read of `teams_notification_deliveries`)
3. Report aggregates `overallStatus` (fail if any fail; warn if any warn; else pass) and a deduped recommendations list.
4. UI renders the step list with status badges + recommendations.
### Flow B: Send test message (proactive)
1. Admin clicks "Send test message".
2. `sendTeamsTestMessage()` resolves availability → admin Microsoft link → latest personal conversation reference.
3. If addon inactive / integration inactive / bot not configured / no linkage / no conversation reference → returns a **skipped** result with an actionable reason (e.g. "message the bot once first") and records a `skipped` delivery row.
4. Otherwise builds a test activity and calls `sendBotActivity({serviceUrl, conversationId, activity})`.
5. Records a `teams_notification_deliveries` row (`status` sent/failed, `destination_type = 'bot_test'`, `category = 'test'`, actor metadata, idempotency key with attempt nonce).
6. UI shows success or the mapped skip/failure reason.
## Data Model / Integration Notes
- **No schema migration.** Reuse:
- `teams_notification_deliveries``category` is nullable, no CHECK; `status` CHECK ∈ {skipped,sent,delivered,failed}; `destination_type` is free text NOT NULL. Test rows use `category='test'`, `destination_type='bot_test'`, `status ∈ {skipped,sent,failed}`.
- `teams_conversation_references` — PK `(tenant, microsoft_user_id, conversation_id)`; columns include `service_url`, `conversation_type`, `last_activity_at`. Reader selects newest `personal` row per `(tenant, microsoft_user_id)`.
- **Transport already exists:** `teamsBotConnector.ts::sendBotActivity()` (token via client-credentials → `https://api.botframework.com/.default`; trusted-serviceUrl suffix check; POST to `/v3/conversations/{id}/activities`). Credentials from env `TEAMS_BOT_APP_ID` / `TEAMS_BOT_APP_TENANT_ID` / `TEAMS_BOT_APP_PASSWORD` (already in `helm/templates/{deployment,secret}.yaml`).
- **PSA-user → Microsoft-user mapping (VERIFIED):** `resolveTeamsRecipientLink(tenant, userId)` returns `{providerAccountId}`, and `providerAccountId` is the AAD **oid**. Conversation references are keyed by `microsoft_user_id = activity.from.aadObjectId` (`teamsConversationReferences.ts:48-50`). `nextAuthOptions.ts:881-882` deliberately stores `claims.oid` as the Microsoft link's `provider_account_id` *because* it equals `aadObjectId` (explicit comment at 878-883). The bot's existing `resolveTeamsLinkedUser` already relies on this. So the test message can feed `providerAccountId` straight into the conversation-reference reader — no normalization helper needed. **Known edge (pre-existing, not introduced here):** the admin bulk backfill flow (`ssoActions.ts:308`) stores `provider_account_id = lowerEmail`; Microsoft links created that way won't match a bot conversation reference (the bot already can't resolve them). Diagnostics should *surface* this (F021/F022 warnings), not fix it.
- **Diagnostics report shape:** mirror `Microsoft365DiagnosticsReport` / `...Step` (`shared/services/email/providers/MicrosoftGraphAdapter.ts:745+`) — step `id/title/status/durationMs/data/error`, `recommendations: string[]`, `overallStatus`.
- **Action wiring:** mirror `teamsObservabilityActions.ts` (`export const x = withAuth(impl)`). Rebuild the package (tsup → dist) so the server picks up new exports (prior loop F045).
## UX / UI Notes
- New Card in `TeamsIntegrationSettings.tsx` below the existing config/package cards: title "Diagnostics & Test Message".
- "Run diagnostics" button → step list, each row: status badge (pass=green / warn=amber / fail=red / skip=grey) + title + detail; recommendations rendered as a bullet list below.
- "Send test message" button → success or friendly skip/error message; the `missing_conversation_reference` skip maps to "Open the Alga PSA bot in Teams and send it any message first, then retry."
- Both buttons disabled when addon missing or integration not active (reuse existing `canPersist`-style gating).
- All strings i18n'd with `defaultValue` fallbacks, mirroring existing `integrations.teams.settings.*` keys.
## Risks / Open Questions
- **Mapping risk — RESOLVED (verified in code):** `provider_account_id` (Microsoft) = AAD `oid` = `microsoft_user_id` by design (`nextAuthOptions.ts:881-882` + `teamsConversationReferences.ts:48-50`). No normalization needed. Residual edge is the email-backfill linking path, which diagnostics surfaces rather than fixes.
- **Bot credentials in dev:** `isBotConnectorConfigured()` is false without env creds, so test message is a no-op locally — diagnostics must report this clearly rather than appearing broken.
- **Permission gate:** confirm the exact permission `saveTeamsIntegrationSettingsImpl` enforces and reuse it.
- **Idempotency for repeated tests:** include an attempt nonce in the test delivery idempotency key so each click records a distinct row (the table has a UNIQUE on `(tenant, idempotency_key)`).
## Acceptance Criteria / Definition of Done
- `runTeamsDiagnostics()` returns all listed checks with correct pass/warn/fail/skip classification and an accurate `overallStatus` + recommendations; tenant-scoped; permission-gated; covered by unit tests.
- `sendTeamsTestMessage()`:
- on a healthy tenant, sends via `sendBotActivity` and records a `sent` delivery row;
- on each unhealthy precondition, returns the correct skip reason and records a `skipped` row;
- on transport failure, records a `failed` row;
- all writes tenant-scoped; covered by unit/integration tests.
- The reader returns the newest personal conversation reference per `(tenant, microsoft_user_id)`, tenant-scoped, null-safe.
- Settings panel renders diagnostics steps + recommendations and the test-message result, with correct disabled states.
- No new migration; no change to existing delivery/bot-reply behavior.
- `@alga-psa/microsoft-teams` rebuilt so the server resolves the new exports.