Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
7.8 KiB
PRD — Vertex Preserved Thinking Chat Provider
- Slug:
vertex-preserved-thinking-chat-provider - Date:
2026-02-25 - Status: Draft
Summary
Implement a provider abstraction for AI chat so Enterprise chat can run on either OpenRouter (current default) or Vertex AI OpenAI-compatible Chat Completions, with first-class support for GLM-5 (glm-5-maas) preserved thinking (reasoning_content), tool calling, and streaming in one continuous loop.
This work must also fix the current behavioral gap where chat/quick ask streaming does not trigger function calls.
Problem
Current chat behavior has three core issues:
- Streaming path drops tool/function semantics:
- Chat and Quick Ask use
/api/chat/v1/completions/stream. - The stream route forwards text token deltas only.
- Function/tool proposals are never surfaced, so function execution cannot start.
- Provider implementation is hardcoded:
- Chat completions are tied to OpenRouter-specific client/model resolution.
- There is no runtime provider selection for Vertex.
- Preserved thinking is not an explicit contract:
- Multi-step tool loops need assistant reasoning carried across turns.
- For Vertex GLM-5 interleaved thinking, this must include
reasoning_contentcontinuity.
Goals
- Add a runtime-selectable chat provider abstraction supporting:
openrouter(default)vertex(OpenAI-compatible endpoint)
-
Enable Vertex GLM-5 (
glm-5-maas) for chat completions and streaming. -
Preserve assistant thinking across tool boundaries using
reasoning_contentsemantics when provider is Vertex. -
Restore function-calling behavior in streaming chat and quick ask.
-
Keep existing approval/decline workflow and temporary API-key execution model.
-
Keep
tool_choice: "auto"for interleaved tool reasoning.
Non-goals
- Migrating non-chat AI surfaces to Vertex.
- Replacing API registry or function execution authorization model.
- Building a tenant UI for provider selection.
- Adding new database tables for reasoning persistence.
- Large observability/analytics redesign.
Users and Primary Flows
- Primary user: Enterprise end user using Sidebar Chat or Quick Ask.
- Primary admin/operator: Environment maintainer configuring provider env vars/secrets.
Flow A: Streaming function call with preserved thinking
- User asks for an action requiring API execution.
- Assistant streams reasoning tokens.
- Assistant streams/proposes a tool call.
- User approves.
- Tool executes.
- Assistant resumes with preserved reasoning + tool result and streams final answer.
Flow B: OpenRouter compatibility
- Provider remains
openrouter. - Existing behavior remains functional.
- Streaming and function approval flow still work.
Flow C: Vertex thinking control
- Provider is
vertex. - Turn-level thinking can be disabled for specific turns by server-side request shaping when needed.
UX / UI Notes
- Sidebar Chat and Quick Ask must behave identically for function calling.
- Streaming should support both:
- reasoning stream updates
- user-facing content stream updates
- Function approval card remains the decision point before executing any endpoint.
- Interrupted streams must preserve partial output state and avoid false “completed” persistence.
- Thinking display may remain collapsible; reasoning and answer channels should be distinct in state even if rendered together initially.
Requirements
Functional Requirements
- Add provider resolver that returns provider id, model, OpenAI-compatible client, and provider-specific request overrides.
- Add Vertex configuration support:
- model default
glm-5-maas - endpoint base URL from explicit setting or project/location synthesis
- auth via Google Cloud access token secret/env
- Extend chat message contract to include optional
reasoning_content. - Preserve reasoning data through:
- request validation
- conversation normalization
- provider message conversion
- tool replay turns
- Replace token-only streaming behavior with orchestrated streaming events that can represent:
- reasoning deltas
- content deltas
- function proposal events
- completion event
- Ensure streamed function proposal reaches client
pendingFunctionstate, enabling/api/chat/v1/execute. - Ensure execute path sends preserved assistant state + tool result back to provider before continuation completion.
- Keep OpenRouter behavior functional and backward-compatible.
- Keep chat API gating (
aiAssistant+ EE checks) unchanged. - Keep
tool_choice: "auto"for OpenRouter and Vertex. - Support server-driven turn-level thinking disable for Vertex when requested.
- Document provider env/secrets contract in env examples.
Non-functional Requirements
- No database migration required.
- Provider selection defaults safely to OpenRouter when unspecified.
- Streaming parser should tolerate unknown SSE event fields.
- Maintain current security posture for function execution (approval + temporary API key).
Data / API / Integrations
- New provider config inputs:
AI_CHAT_PROVIDER(openrouter|vertex)- OpenRouter:
OPENROUTER_API_KEY,OPENROUTER_CHAT_MODEL - Vertex:
GOOGLE_CLOUD_ACCESS_TOKEN(or equivalent secret),VERTEX_PROJECT_ID,VERTEX_LOCATION,VERTEX_CHAT_MODEL, optionalVERTEX_OPENAPI_BASE_URL, optional thinking toggle
- Chat message model (server/client)
- Add optional
reasoning_contenton assistant messages.
- Streaming event model
- SSE events must carry typed payloads for reasoning, content, function proposals, and done.
- Vertex request shaping
- Use OpenAI-compatible chat completions endpoint.
- Include
reasoning_contenton assistant turns when available. - Allow turn-level thinking override payload when configured.
- Existing execution model
- Keep API registry search +
call_api_endpoint+ approval handshake.
Security / Permissions
- No change to permission boundaries for endpoint execution.
- No bypass of manual approval for approval-required calls.
- Continue issuing and revoking temporary API keys for approved execution.
- Provider credentials must resolve via secret provider/env; never serialized to client.
Observability
- Keep existing logs, add provider id in completion/stream logs for debugging.
- No new observability system work in this scope.
Rollout / Migration
- Deploy with default provider = OpenRouter.
- Enable Vertex by environment configuration only.
- Roll out Vertex first in non-production environments.
- Keep immediate fallback: revert provider env to OpenRouter.
Open Questions
- Token source lifecycle: should Vertex OAuth access token be externally refreshed and injected, or should server mint tokens from service account credentials in-process?
- Thinking visibility policy: should reasoning UI be visible by default or hidden by default for end users?
- Turn-level thinking control source: env-only for now, or request-level heuristic toggle from server logic?
Acceptance Criteria (Definition of Done)
- With provider
openrouter, chat and quick ask can again propose functions during streaming and execute approved calls end-to-end. - With provider
vertex, chat and quick ask can stream reasoning + content, propose functions, execute approved calls, and continue with preserved reasoning context. reasoning_contentsurvives assistant -> tool -> assistant loops for Vertex without being dropped./api/chat/v1/completions/streamemits structured events sufficient for UI function proposal and continuation flow.- Existing
/api/chat/v1/executeapproval model remains intact. - No database migration required; existing message persistence remains functional.
- Provider env configuration is documented in
.env.exampleandee/server/.env.example. - Test suite additions cover provider selection, Vertex request shape, streaming event parsing, function proposal path, and at least one DB-backed happy path plus one DB-backed guard/failure path.