Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

148 lines
22 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# SCRATCHPAD — AlgaPSA MCP Server
> Working memory for the effort. Source of truth for scope = `design.md` in this folder.
## Context
Implement AlgaPSA as an MCP **server** in two transports: a free CE local stdio connector and an EE remote Streamable HTTP server with governance. Central design move: **progressive disclosure** — 3 constant meta-tools, not per-endpoint tools — reusing the **existing EE chat agentic engine**.
## Key discoveries (existing code to reuse)
- **The engine already exists** in the EE chat assistant:
- `ee/server/src/services/chatCompletionsService.ts` — agent loop; `buildToolDefinitions()` (~line 958) defines the meta-tools `search_api_registry`, `search_business_data`, `call_api_endpoint`, `finish_response`; `executeFunctionCall()` (~line 3317) dispatches via a temp API key; `searchBusinessData()` (~line 1206) calls server-internal full-text search.
- `ee/server/src/chat/registry/apiRegistry.schema.ts``ChatApiRegistryEntry` (carries `rbacResource`, `approvalRequired`, `parameters`, request/response schemas, `examples`, `playbooks`). **Pure types.**
- `ee/server/src/chat/registry/search.ts``searchRegistryEntries()` ranked search. **Pure TS, imports only the schema type → trivially extractable.**
- `ee/server/src/chat/registry/apiRegistry.generated.ts` — generated registry (~1.2MB).
- `ee/scripts/generate-chat-registry.mjs` — generator from OpenAPI; supports YAML overrides in `ee/docs/api-registry/`.
- **OpenAPI specs exist for both editions:** `sdk/docs/openapi/alga-openapi.ce.json` and `…ee.json` (+ yaml). Generator: `sdk/scripts/generate-openapi.ts`.
- **HTTP surfaces the connector needs already exist:**
- Global search: `server/src/app/api/v1/search/route.ts` (+ per-entity `*/search`).
- Meta endpoints: `server/src/app/api/v1/meta/{openapi,endpoints,schemas,sdk}` — precedent for adding `meta/mcp-registry`.
- **API-key auth:** `server/src/lib/api/middleware/apiAuthMiddleware.ts` (`x-api-key` / Bearer, `api_keys` table). Subject already carries `apiKeyId`.
- **Authz kernel:** `server/src/lib/authorization/kernel/{contracts.ts,engine.ts}`. `AuthorizationSubject` is open-shaped (`[key: string]: unknown`) → can add `agentId` + subject type `'agent'`.
- **Audit:** `server/src/lib/logging/auditLog.ts``audit_logs` table. `auditLog(knex, {userId, operation, tableName, recordId, changedData, details})`.
- **Edition gating:** `server/src/lib/features.ts``isEnterpriseEdition()`, `getFeatureImplementation()`. EDITION env (`community`|`ee`|`enterprise`).
- **Monorepo:** npm workspaces (root `package.json`), Nx. New CE pkg → `packages/agent-tooling`; connector → `packages/alga-mcp-connector` (or `@alga/mcp-connector`); remote endpoint lives in the server app under `ee/`.
## Decisions (see design.md §8)
1. Progressive disclosure: 3 constant meta-tools, no per-endpoint tools.
2. Extract engine to shared CE package `agent-tooling`; chat + both MCP transports consume it.
3. **Anything networked is EE** (tightens source spec §3.2/§6 — remote base is no longer CE).
4. MCP Resources dropped from scope (subsumed by progressive disclosure).
5. Registry **fetched from the instance** (`meta/mcp-registry`), not bundled (avoids fleet drift).
6. Local connector reuses existing `api_keys` mechanism — no new token type.
7. Phase order: CE local → EE remote (MVP gov) → governance depth.
8. Temp-key-from-session dispatch stays EE (chat); connector calls `/api/v1` directly with user token.
## Open questions / deferred
- **Deferred:** approval-gate resolution over request/response MCP (Phase 3 design spike) — candidates: `pending_approval` handle resolved via Streamable HTTP streaming within timeout, or a `check_approval(handle)` tool.
- Do `/api/v1/search` ACL semantics match the chat assistant's internal ACL path, or need reconciliation?
- OAuth: AlgaPSA-as-authorization-server vs. delegate to tenant IdP (P2 vs SSO-bound identity in P3).
## Testing posture
**80/20 by explicit user directive** — lean test list, high-value risks only. This intentionally **overrides** the software-planner default of "tests > features." Do not exhaustively test thin pass-throughs.
## Commands / runbooks
- Generate registries (to be generalized for CE+EE): `node ee/scripts/generate-chat-registry.mjs`
- Build editions: `npm run build:ce` / `npm run build:ee`
- OpenAPI regen: `sdk/scripts/generate-openapi.ts`
## Gotchas
- `searchBusinessData()` in chat uses server-internal DB search (`createTenantKnex`, ACL principal) — **not** reachable from a workstation connector. The connector must use the HTTP `/api/v1/search` endpoint instead.
- Re-pointing the chat assistant onto `agent-tooling` is the only shipped-code change in Phase 1 → regression-test the existing chat flow.
- Registry is ~1.2MB; serve gzipped from `meta/mcp-registry`.
## Implementation log / surprises
### 2026-06-06 — Group A (F001-F003): agent-tooling package extracted
- Created `packages/agent-tooling` (CE), mirroring `packages/formatting` conventions (src-export map, tsup preset, project.json, vitest). Typechecks + builds + 6 search tests pass.
- **Decision:** *copy* schema+search into the package and leave `ee/server/src/chat/registry/*` untouched for now (brief, intentional duplication). The EE chat re-point + de-dup is Group D — deferred so the connector (Group F) lands first with **zero** changes to the build-critical `server/next.config.mjs` or shipped chat code.
- **Sequencing change vs features.json order:** executing A → B → F (standalone connector, no Next.js) BEFORE C → E → D (server integration + next.config edits + chat re-point). Risk pushed later; value (working CE connector) lands first.
- **SURPRISE — search never returns empty for a non-empty query.** `scoreEntry` adds an unconditional recency bonus `Math.max(0, 2 - index*0.05)`, so the first ~40 registry entries always score > 0 even with zero token/intent match. Implication for MCP: `search_api_registry` on an irrelevant query returns low-relevance entries (by registry order), not an empty set. The agent must judge relevance from the returned scores/descriptions. Consider surfacing the `score` in the MCP tool result so the model can tell "weak match" from "strong match". Not changing the algorithm now (parity with shipped chat behavior).
- **next.config.mjs reality:** per-package webpack aliases exist in TWO blocks (dev-source ~L230-274 and prebuilt ~L515-544) plus a `transpilePackages` list (~L413). Group E/D must add `@alga-psa/agent-tooling` to all three. Build-critical file — edit carefully.
### 2026-06-06 — Group F (F012-F020): @alga-psa/mcp-connector built
- New `packages/alga-mcp-connector` (publishable, NOT private — the one shippable package). `npx`-runnable bin via tsup `banner` shebang; bundles `agent-tooling` (`noExternal: [/@alga-psa\//]`) so the published artifact only needs the public MCP SDK at runtime. Verified: `searchRegistryEntries` is inlined in the 25KB dist bin.
- **Decision — low-level `Server` API, not `McpServer`+Zod.** `buildMetaToolDefinitions` already emits raw JSON-Schema `inputSchema`, which maps 1:1 to the low-level `ListToolsRequestSchema` handler. Avoids re-expressing schemas in Zod.
- **Decision — connector always uses `edition: 'ce'` tool templating** (no approval clause) regardless of the instance edition that served the registry — the local connector is inherently user-scoped. EE templating is for the Phase-2 remote server.
- **Conformance proven in-memory (T011):** `InMemoryTransport.createLinkedPair()` + SDK `Client` ↔ our server. listTools → exactly the 3 tools; callTool search works; HTTP-failure → `isError`. Fail-fast verified by running the built bin with no env (clear stderr msg, exit 1). stdout kept clean (all logs → stderr).
- **SURPRISE — MCP SDK 1.29 pulls ~50 transitive deps (express, ajv, hono, eventsource, …).** It bundles the Streamable-HTTP server transport, so even a stdio-only connector drags the HTTP stack in at install. Harmless (tree-shaken from our bundle; runtime only needs the SDK), but worth knowing. `package-lock.json` now pins `@modelcontextprotocol/sdk@1.29.0` — committed with this group.
- **OPEN — tenant header.** Connector relies on API-key→tenant resolution and adds `x-tenant-id` only if `ALGA_TENANT_ID` is set. Must verify against a live instance whether `apiAuthMiddleware` requires the tenant header for `validateApiKeyAnyTenant`. Tracked for Group G live E2E.
- **Contract pinned for Group E:** connector expects `GET /api/v1/meta/mcp-registry` → JSON `{ entries: [...] }` (also tolerates a bare array), auth via `x-api-key`. `search_business_data``GET /api/v1/search?query=&types=csv&limit=&cursor=&sort=` (confirmed against `ApiSearchController`).
- T012 (live E2E) deferred to Group G — will drive the built bin over real stdio against a local mock HTTP instance.
### 2026-06-06 — Group C (F006, T005): dual-edition registry generation
- Generalized `ee/scripts/generate-chat-registry.mjs` to emit **both** editions in one run (CE → `server/src/lib/mcp/registry.generated.ts`, EE → existing location). CE file imports the type from `@alga-psa/agent-tooling/registry/schema`. Added root npm script `mcp:registry:generate`. T005 is enforced **in the generator** as a hard invariant: it throws if any CE endpoint is absent from EE.
- Added `@alga-psa/agent-tooling` to `tsconfig.base.json` paths — the repo resolves `@alga-psa/*` types via tsconfig `paths` (not the package `exports` map), and the package emits no `.d.ts` (preset `dts:false`). This is why the IDE flagged "Cannot find module"; the connector's own `tsc` passed because it uses `moduleResolution: Bundler`. The base-paths entry fixes IDE + global typecheck and is required for the server (Group D/E) to import the package.
- **SURPRISE — the committed EE chat registry is STALE.** Regenerating from the current EE spec went **609 → 901 entries** (+292 real endpoints, e.g. inboundwebhooks). The committed registry was generated 2026-04-29; the EE spec was updated 2026-06-04. So the in-app chat is currently ~292 endpoints behind its own API spec.
- **Decision:** did NOT refresh the EE registry in this commit — it's a 14k-line, chat-behavior-changing diff unrelated to the MCP extraction, and warrants its own review. Reverted the EE regeneration; committed only the new CE registry. **Consequence:** committed CE (879, fresh) is briefly *larger* than committed EE (609, stale); they serve independent consumers, so no runtime issue, but the CE⊆EE invariant only holds on a fresh dual generation.
- **Follow-up to surface to the user:** run `npm run mcp:registry:generate` and commit the refreshed EE registry separately (also refreshes the connector's view of an EE instance via the meta endpoint). Both the chat and EE-instance MCP currently see the stale set.
### 2026-06-06 — Group E (F009-F011): GET /api/v1/meta/mcp-registry
- Added `getMcpRegistry()` to `ApiMetadataController` + route `server/src/app/api/v1/meta/mcp-registry/route.ts`. Auth via the shared `authenticate()` + `assertProductApiAccess` (F010). Returns `{ edition, count, entries }`.
- **Edition-aware with ZERO next.config changes (F011).** CE registry is `await import('@/lib/mcp/registry.generated')`; on EE, `await import('@product/chat/entry').eeMcpRegistry` (added that export to `packages/product-chat/ee/entry.tsx`, the established CE→EE seam used by the chat routes). Falls back to CE if the EE artifact is missing.
- **Why no next.config alias was needed:** changed the generator to emit `import type { ChatApiRegistryEntry }` in the registry files → the schema import is erased at runtime, so the CE registry never pulls `@alga-psa/agent-tooling` into the server's runtime graph. Regenerated the CE registry with this. (The agent-tooling webpack alias is only needed for Group D, when the chat *runtime* imports the package.)
- LSP shows `@ee/*` "cannot find module" for product-chat/ee/entry.tsx — that's the file's normal state (the `@ee/` alias resolves only in the EE build), and affects the pre-existing service imports identically. Not a regression.
- **T007 (live endpoint auth + edition) NOT auto-tested** — a Next route handler needs the full server/DB/auth stack (poor 80/20). Auth is the shared, already-tested middleware; the edition branch is trivial; the registry-fetch *contract* is covered by the Group G connector E2E against a mock instance. Validate the real endpoint via a running dev server (manual).
### 2026-06-06 — Group D (F007, F008): re-point EE chat onto agent-tooling
- Replaced `ee/server/src/chat/registry/{apiRegistry.schema,search}.ts` with **thin re-export shims**`@alga-psa/agent-tooling/registry/{schema,search}`. De-dups the ~360 lines that were copied into the package in Group A. Every existing import path (indexer, generated registry, chatCompletionsService) keeps resolving via the shim. Behavior is identical **by construction** (verbatim re-export of the same code).
- **F008 preserved:** the temp-key-from-session dispatch (`executeFunctionCall` + `TemporaryApiKeyService`) stays in EE `chatCompletionsService`; the package only does request-*building*. Chat's own OpenAI/Vertex-shaped `buildToolDefinitions` (with `finish_response` + business-search enum) stays in EE too — intentionally NOT replaced by the package's MCP-shaped `buildMetaToolDefinitions` (different transport/format).
- **next.config.mjs:** added `@alga-psa/agent-tooling` (+`/` subpath variant) in all three places, mirroring the source-transpiled `scheduling`/`formatting` packages exactly: turbopack `resolveAlias`, `transpilePackages`, and the webpack `config.resolve.alias` "Source-transpiled" block. This runtime alias is needed (unlike Group E) because the chat imports `searchRegistryEntries` as a runtime value. Verified the config still parses/loads (`node import()`); agent-tooling + connector tests still pass.
- **⚠️ FINAL GATE I could NOT run in-session:** a full EE/CE Next build (`npm run build` / `npm run dev`) to confirm the webpack/turbopack alias resolves at build time. The edits mirror a known-working package precisely and the config parses, but a real build is the definitive check. **Surface this to the user.**
- **T006 (chat regression) left implemented=false:** behavior is preserved by construction, but the live chat flow (LLM + server) wasn't exercised. Verify on a running dev server.
### 2026-06-06 — Group G (F021, T009, T012): Phase 1 E2E
- Added `e2e.test.ts`: a mock AlgaPSA HTTP instance + the **real** `InstanceClient` + the **real** MCP protocol (InMemory transport). Drives the full path: registry fetch → `search_api_registry``call_api_endpoint``GET /api/v1/tickets/{id}`, plus a 401 auth-failure case. 17 connector tests pass.
- Covers T009 (real `/api/v1` dispatch + parsed result) and T012 (lists + reads a ticket). F021 acceptance is faithfully *simulated* (real HTTP + protocol); real Claude-Desktop verification is manual.
## PHASE 1 STATUS — COMPLETE (pending live gates)
**Done & committed (8 commits):** `agent-tooling` package (registry/search/request-build/tool-defs), `@alga-psa/mcp-connector` stdio bin, dual-edition registry generation + CE artifact, `GET /api/v1/meta/mcp-registry`, EE chat re-pointed onto the shared package. **21/21 Phase-1 features; 10/12 Phase-1 tests** (29 automated tests across the two packages all green).
**Live gates I could NOT run in-session (surface to user):**
1. **EE/CE Next build** (`npm run build` / `npm run dev`) — validates the Group D `next.config` alias resolves at build time. Edits mirror `scheduling`/`formatting` exactly; config parses; but a real build is the definitive check. (→ T006 chat regression + T007 endpoint auth ride on this.)
2. **EE registry is stale** (609 vs 901) — run `npm run mcp:registry:generate`, review, commit separately.
3. **Connector tenant header** — verify whether `/api/v1` needs `x-tenant-id` or resolves tenant from the API key (set `ALGA_TENANT_ID` if required).
## PHASE 2 COMPLETE (2026-06-07) — remote governance, live-verified
**13/14 Phase-2 features done** (F026 Dynamic Client Registration intentionally dropped — spec downgraded it; IdP delegation registers clients at the IdP). Built + verified live against the EE dev server (:3001) with a **mock IdP** (RS256 keypair + a local JWKS server).
- **Agent identity** (`agents`, `agent_idp_providers`, `agent_roles`, `api_keys.agent_id`; migrations applied). Agents are backed by a **no-login internal user** so the existing kernel + `hasPermission` enforce the agent's RBAC roles (the kernel's `defaultRbacEvaluator` re-fetches by `user_id`, so a backing user is the low-risk way to reuse all authz — vs. a riskier core RBAC change). `AuthorizationSubject` gains `agentId`/`subjectType`.
- **IdP-delegated auth** (`idpToken.ts`, jose): validate a Bearer JWT against a tenant-trusted IdP's JWKS (iss/aud/resource), map subject → agent. Resource server only; no Alga AS. PRM at `/.well-known/oauth-protected-resource`.
- **Dispatch**: agent path mints a short-lived agent-scoped key and calls `/api/v1` → kernel enforces the agent's roles. Every tool call written to `mcp_agent_audit`; exportable via `/api/v1/mcp/audit`.
- **Provisioning API**: `/api/v1/mcp/{agents,idp-providers,audit}` (EE, API-key admin auth).
- **E2E proof (live)**: Admin agent → reads a real ticket; **no-role agent → 403 (RBAC deny)**; **untrusted issuer → 401**; agent action audited. T013-T017 covered by this live mock-IdP E2E.
**SURPRISES / fixes during the live build:**
- **RLS is no longer used** (per Robert). My agent tables' tenant_isolation RLS policies referenced `current_setting('app.current_tenant')`, which the app no longer sets → `guc.c find_option` 500s. Removed RLS from all four tables (live + migration files); tenant isolation is in code (`.where({tenant})`).
- **Global API middleware** (`server/src/middleware.ts`) enforces `x-api-key` on `/api/*` and rejected the agent's `Bearer` JWT ("API key missing"). Added `/api/mcp` to `apiKeySkipPaths` (it authenticates in-route). `/api/v1/mcp/*` provisioning stays gated.
- **Multi-dir migrations**: `knex_migrations` references CE+EE+ext dirs, so single-dir `migrate:latest` reports "corrupt". Applied the two new migrations surgically (run `up()` + record). The repo's `server/scripts/run-ee-migrations.js` is the proper merged runner.
- Backing-user pattern means `api_keys.user_id` stays set (the backing user) even though I made it nullable; `agent_id` links the agent. Threading `agentId`/`subjectType` fully through the `/api/v1` subject builder is a future refinement (today attribution/audit live at the MCP layer; RBAC via the backing user).
### Pre-existing Phases 23 notes (superseded for Phase 2 above)
**Phases 23 (EE remote + governance) NOT started** — F022-F043. F022/F023 (Streamable HTTP transport + 3 tools, EE-gated) are implementable now (analogous to the connector). F024+ (OAuth 2.1, agent identity, ABAC, approval gates, quotas, SSO) need product decisions first: OAuth AS-vs-IdP strategy, and the deferred approval-over-request/response mechanism.
## LIVE BRING-UP (2026-06-07) — both MCPs running against the dev server (:3001, EE)
Dev server: `feature/alga-mcp-server/server`, Next 16.2.6, `PORT=3001 npm run dev` (nx `server:next:dev`), `NEXT_PUBLIC_EDITION=enterprise` (from server/.env). DB: docker `algamcp-postgres-1`. Test API key minted via DB insert (SHA-256 of a random token; internal user dorothy@kansas.oz; saved at /tmp/alga_mcp_token.txt, description 'mcp-test-key').
**EE-BUILD GATE CLEARED.** Restarted the dev server to pick up the Group-D `next.config` agent-tooling alias. `/api/mcp` `tools/list` returned the 3 tools — i.e. `buildMetaToolDefinitions` (an agent-tooling **runtime** value) resolved at runtime. So the turbopack/webpack alias + transpilePackages edits are correct. Server booted clean; chat path (search.ts shim runtime value) implicitly exercises the same alias.
**LOCAL MCP — works.** Built bin driven over real stdio (SDK StdioClientTransport) against :3001: `search_api_registry('list tickets')``get-_api_v1_tickets`; `call_api_endpoint` → HTTP 200, real ticket "Ruby Slippers Server Power Fluctuation"; `search_business_data` → valid response. Re-verified after the server restart.
**SERVER MCP — works.** New EE-gated `POST /api/mcp` (Streamable HTTP, JSON-RPC). Synthetic curl drive: unauth → 401; `initialize` → protocol result; `tools/list` → 3 tools; `tools/call search_api_registry` → ranked; `tools/call call_api_endpoint(get-_api_v1_tickets)` → HTTP 200 real ticket.
**BUG FOUND + FIXED via live test:** the connector's `fetchRegistry` only read top-level `entries`, but the real endpoint returns Alga's `{ data: { entries } }` envelope → connector couldn't parse the registry. Fixed `instanceClient.fetchRegistry` to unwrap `data`; updated the E2E mock to use the envelope. (Pure-unit tests had missed it because the mock returned a bare `{entries}`.)
**NOTES / not-yet-done:**
- `app_search_index` has **0 rows** in this dev DB → `search_business_data` correctly returns empty. Tool is fine; the index just isn't populated.
- Server MCP **auth is an MVP stand-in: Alga API key** (`x-api-key`/Bearer validated via `validateApiKeyAnyTenant`), NOT the designed IdP-delegated OAuth (F024/F025). The 401 also advertises a `WWW-Authenticate: ...resource_metadata` header, but the PRM endpoint isn't built yet.
- Server MCP **dispatch is self-HTTP** to `/api/v1` under the caller's key (reuses agent-tooling `buildRequest`), NOT the designed in-process kernel dispatch under an agent subject (F031). Good enough to prove the transport + tool surface; swap to kernel dispatch when agent identity (F027) lands.
- So **F022 (transport) + F023 (3 tools over remote) = done (MVP)**; F024-F033 (OAuth/IdP, agent identity, audit) remain.