PSA/ee/docs/plans/2025-11-03-multi-provider-sso-integration-plan.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

98 lines
9.9 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Multi-Provider SSO Integration Plan
## Overview
- Deliver Google Workspace and Microsoft 365 SSO that fits the existing multi-tenant NextAuth.js stack without disrupting current credential flows.
- Normalize identity data across providers so tenant routing, OTT vanity handoff, and user_type logic continue to function uniformly.
- Provide a guided migration path that lets password-based users link SSO identities safely and allows ops to monitor and enforce adoption by tenant.
## Phased Technical Plan
### Phase 0 Foundations and Provider Enablement
- [ ] Provision Azure AD enterprise app, confirm Google OAuth project, and capture client IDs, secrets, tenant IDs, and redirect URIs.
- [x] Update secret loader so Microsoft OAuth values (client ID, secret, authority/tenant) are available alongside Google via `getNextAuthSecret`.
- [x] Add Microsoft provider configuration (e.g., `AzureADProvider`) and refresh Google provider options to share the common claim extractor.
- [x] Refresh `.env.example`, devbox/dev compose templates, and developer docs to include both providers for local testing.
- [x] Document OAuth secrets and redirect URIs in EE configuration guides while leaving CE `.env.example` entries absent or commented.
- [x] Wrap provider registration in `server/src/app/api/auth/[...nextauth]/options.ts` with the existing `isEnterprise` guard so CE builds resolve to stubs.
### Phase 1 Provider Integration and Claim Normalization
- [x] Implement a shared profile mapper that converts Google and Microsoft payloads into the `ExtendedUser` schema.
- [x] Extend `signIn` and `jwt` callbacks to apply tenant resolution from query `tenant_hint`, vanity-domain headers, or email-domain heuristics when provider data is incomplete.
- [x] Invoke existing user validation (active status, tenant membership, user_type) within the OAuth callback before token issuance.
- [x] Confirm `session` and `redirect` callbacks read normalized claims so OTT and redirect flows behave consistently across providers.
- [x] Place provider adapters, claim mappers, and account-link helpers in `ee/server/src/lib/auth/ssoProviders.ts` with matching stubs in `server/src/empty/lib/auth/ssoProviders.ts`.
### Phase 2 Account Linking and Migration Path
- [x] Create or extend a `user_auth_accounts` table keyed by user ID and provider (google|microsoft) with provider subject IDs and metadata.
- [x] Deliver an authenticated “Connect SSO” flow that revalidates password and TOTP before capturing OAuth provider details.
- [x] Update credential login surfaces to detect linked providers, display SSO prompts, and steer users toward OAuth flows (local 2FA suppression to follow policy controls).
- [ ] Build a batch backfill script for federated email domains and log unresolved accounts for manual review.
- [x] Store schema migrations for the new linking table under `ee/server/migrations` and supply CE no-op stubs.
- [x] Serve SSO buttons and the “Connect SSO” settings page from `@ee` components/pages with CE stubs in `server/src/empty`.
### Phase 3 Built-in SSO Bulk Assignment UI
- [ ] Extract bulk assignment preview/execute helpers into `ee/server/src/lib/actions/ssoActions.ts` (with CE stubs in `server/src/empty`) so the existing backfill script shares logic and only admins with `settings.update` can run it.
- [ ] Build `SsoBulkAssignment.tsx` container plus `SsoBulkAssignmentForm.tsx` to render provider toggles, multi-domain input, and an internal/client selector with preview and execute buttons wired to the new actions.
- [ ] Surface preview results inline (counts for linked, skipped, already linked) and trigger toast/audit events on assignment; ensure preview falls back gracefully when no domains or providers are selected.
- [ ] Add a "Single Sign-On" tab to `SecuritySettingsPage.tsx` navigation, mount the bulk assignment UI, and keep Roles/Permissions/Policies unaffected.
- [ ] Extend acceptance coverage to include UI-driven dry-run/assign flows and confirm CE/enterprise builds resolve the new components and actions correctly.
#### Decision (2025-11-06): Provider opt-in per user
- **Summary**: New staff are explicitly classified as `internal` (password/TOTP only) or assigned a specific SSO provider at creation time. Auto-detection on login is disabled unless an admin enabled at least one provider for that user. Phase 3 still delivers the bulk assignment tooling so admins can flip providers for many accounts at once.
- **Reasoning**: Automatically trusting any matching email introduced takeover risk (e.g., an attacker controlling an external domain or recycled mailbox could satisfy a loose email match). Requiring admins to opt users into a provider maintains least privilege and keeps forensic/audit signals intact.
- **Intended behavior**:
- User creation UI/API exposes a selector: `Internal (password only)` vs `Google` vs `Microsoft` (extendable). Default remains Internal.
- Login callbacks accept OAuth responses only when the user record lists that provider; otherwise the attempt is rejected and logged.
- Bulk assignment preview/execute flows remain valuable to migrate cohorts (entire domains, departments) into a provider without editing each profile.
- Existing auto-link toggle is scoped to future automation that _pre-assigns_ providers; it no longer blindly creates links during login without an admin decision.
### OUT OF BAND Rollout, Monitoring, and Policy Controls
- [ ] Introduce feature flags or configuration to enable SSO providers per tenant/portal for controlled rollout.
- [ ] Instrument telemetry to capture provider usage, OTT handoffs, migration completions, and repeated password fallbacks.
- [ ] Add policy controls allowing tenants to require SSO and to determine whether local TOTP remains after OAuth logins.
- [ ] Publish operational runbooks covering break-glass password resets, tenant onboarding checklists, and SSO troubleshooting.
- [ ] Update `scripts/build-enterprise.sh`, validate CE Docker builds resolve `@ee` imports to stubs, and gate new OAuth integration tests behind `process.env.EDITION === 'enterprise'`.
### Acceptance Tests
- [ ] EE Connect SSO flow: verifies password + TOTP gating, sets nonce cookie, and redirects through selected provider; linking renders in Account Management afterwards.
- [ ] Migration banner journey: credential user with new SSO link sees call-to-action on next password login, completes OAuth sign-in, and subsequent logins default to provider without local 2FA prompts.
- [ ] MSP credential form: with linked provider returns SSO prompt, `Sign in with <provider>` launches OAuth and lands on dashboard.
- [ ] Client portal credential form: with tenant slug + linked provider displays SSO banner and respects vanity domain callback handling.
- [ ] JWT/session callbacks: ensure tenant slug and contact/client data persist across logins for both Google and Microsoft profiles.
- [ ] Backfill script dry-run/live modes: process sample users without creating duplicates; verify CE builds skip EE scripts/components and migration prompts respect pre-linked accounts.
## Background and Investigational Notes
### Existing Authentication Architecture
- **Tech stack**: NextAuth.js with JWT strategy (configurable `SESSION_MAX_AGE`), custom session cookies, on-demand user validation per request.
- **Providers in place**: Google OAuth using `GoogleProvider`; Keycloak integration via `KeycloakProvider`; custom credentials provider with password + 2FA.
- **User portals**: Internal MSP staff sign in at `/auth/msp/signin` (`user_type: internal`); client users sign in at `/auth/client-portal/signin` (`user_type: client`, with `clientId` and `contactId` requirements).
- **Core files**: NextAuth handler (`server/src/app/api/auth/[...nextauth]/route.ts`), options (`.../options.ts`), credential logic (`server/src/lib/actions/auth.tsx`), registration/reset, session cookies, and portal-specific forms.
- **JWT/session callbacks**: `signIn` tracks last login and client redirects; `jwt` populates claims (id, email, tenant, user_type); `session` turns tokens into session objects; `redirect` routes users by `user_type`.
- **Client portal handoff**: Vanity domain redirect uses OTT tokens via `computeVanityRedirect` and `PortalSessionHandoff.tsx`.
### Credentials Flow Snapshot
- Email/password checked against database hashed password.
- 2FA enforced when `two_factor_enabled` using TOTP codes (passed as `twoFactorCode`).
- JWT issued with tenant and user metadata; session callback mirrors data; redirect logic handles portal selection.
### OAuth Flow Snapshot
- User triggers provider button; profile callback runs.
- **Google**: Currently requires existing DB user by email and verifies active status; assigns default `user_type` (internal).
- **Keycloak**: Accepts profile data with tenant/user_type claims.
- Once profile is accepted, standard JWT/session callbacks run.
### Investigation Takeaways Relevant to Plan
- Need consistent claim normalization so OTT and redirect logic remain unchanged across providers.
- Tenant determination for OAuth logins is currently limited; must combine query hints, vanity headers, or email-domain mapping.
- 2FA bypass expectations differ by provider; policy controls will decide whether to trust external 2FA or enforce local TOTP post-login.
- Account linking is required to prevent duplicate user records and to let existing credential users migrate smoothly.
- Future enhancements may include auto-provisioning (SCIM/Azure AD) and Google auto-provisioning; plan leaves hooks for these but focuses on core SSO enablement.
### SSO Migration Tooling
- Use `pnpm tsx ee/scripts/backfill-sso-links.ts --provider=<google|microsoft> --domain=example.com[,domain2.com] [--dry-run]` to seed `user_auth_accounts` rows for federated domains.
- The script skips inactive users, records metadata indicating the backfill source, and treats email addresses as provisional provider IDs until real OAuth logins occur.
- Run in `--dry-run` mode first to review counts, then execute without the flag to persist links; reruns are idempotent and update only matching tenant/user combinations.