alldigital/PSA

Fork 0

Hermes 284313f908

Bidi Control Character Guard / bidi-control-guard (push) Waiting to run

Details

Circular Dependency Check / Check for new circular dependencies (push) Waiting to run

Details

Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run

Details

E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run

Details

ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run

Details

Integration Tests / Check for relevant changes (push) Waiting to run

Details

Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions

Details

Mobile checks / Mobile lint + typecheck (push) Waiting to run

Details

Mobile checks / Mobile unit tests (push) Waiting to run

Details

Mobile checks / Mobile dependency audit (report) (push) Waiting to run

Details

Mobile checks / Mobile reproducibility checks (push) Waiting to run

Details

Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run

Details

Temporal Readiness / fast-readiness (push) Waiting to run

Details

Temporal Readiness / docker-parity (push) Waiting to run

Details

TypeScript Type Check / Nx affected typecheck (push) Waiting to run

Details

Unit Tests / Skipped-test budget (push) Waiting to run

Details

Unit Tests / Nx affected unit tests (push) Waiting to run

Details

Unit Tests / Server unit coverage (informational) (push) Waiting to run

Details

Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run

Details

Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions

Details

EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run

Details

Initial import of AlgaPSA codebase from PSA server

Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech

2026-06-22 16:12:17 -05:00

7.8 KiB

Raw Permalink Blame History

PRD — Vertex Preserved Thinking Chat Provider

Slug: vertex-preserved-thinking-chat-provider
Date: 2026-02-25
Status: Draft

Summary

Implement a provider abstraction for AI chat so Enterprise chat can run on either OpenRouter (current default) or Vertex AI OpenAI-compatible Chat Completions, with first-class support for GLM-5 (glm-5-maas) preserved thinking (reasoning_content), tool calling, and streaming in one continuous loop.

This work must also fix the current behavioral gap where chat/quick ask streaming does not trigger function calls.

Problem

Current chat behavior has three core issues:

Streaming path drops tool/function semantics:

Chat and Quick Ask use /api/chat/v1/completions/stream.
The stream route forwards text token deltas only.
Function/tool proposals are never surfaced, so function execution cannot start.

Provider implementation is hardcoded:

Chat completions are tied to OpenRouter-specific client/model resolution.
There is no runtime provider selection for Vertex.

Preserved thinking is not an explicit contract:

Multi-step tool loops need assistant reasoning carried across turns.
For Vertex GLM-5 interleaved thinking, this must include reasoning_content continuity.

Goals

Add a runtime-selectable chat provider abstraction supporting:

openrouter (default)
vertex (OpenAI-compatible endpoint)

Enable Vertex GLM-5 (glm-5-maas) for chat completions and streaming.
Preserve assistant thinking across tool boundaries using reasoning_content semantics when provider is Vertex.
Restore function-calling behavior in streaming chat and quick ask.
Keep existing approval/decline workflow and temporary API-key execution model.
Keep tool_choice: "auto" for interleaved tool reasoning.

Non-goals

Migrating non-chat AI surfaces to Vertex.
Replacing API registry or function execution authorization model.
Building a tenant UI for provider selection.
Adding new database tables for reasoning persistence.
Large observability/analytics redesign.

Users and Primary Flows

Primary user: Enterprise end user using Sidebar Chat or Quick Ask.
Primary admin/operator: Environment maintainer configuring provider env vars/secrets.

Flow A: Streaming function call with preserved thinking

User asks for an action requiring API execution.
Assistant streams reasoning tokens.
Assistant streams/proposes a tool call.
User approves.
Tool executes.
Assistant resumes with preserved reasoning + tool result and streams final answer.

Flow B: OpenRouter compatibility

Provider remains openrouter.
Existing behavior remains functional.
Streaming and function approval flow still work.

Flow C: Vertex thinking control

Provider is vertex.
Turn-level thinking can be disabled for specific turns by server-side request shaping when needed.

UX / UI Notes

Sidebar Chat and Quick Ask must behave identically for function calling.
Streaming should support both:

reasoning stream updates
user-facing content stream updates

Function approval card remains the decision point before executing any endpoint.
Interrupted streams must preserve partial output state and avoid false “completed” persistence.
Thinking display may remain collapsible; reasoning and answer channels should be distinct in state even if rendered together initially.

Requirements

Functional Requirements

Add provider resolver that returns provider id, model, OpenAI-compatible client, and provider-specific request overrides.
Add Vertex configuration support:

model default glm-5-maas
endpoint base URL from explicit setting or project/location synthesis
auth via Google Cloud access token secret/env

Extend chat message contract to include optional reasoning_content.
Preserve reasoning data through:

request validation
conversation normalization
provider message conversion
tool replay turns

Replace token-only streaming behavior with orchestrated streaming events that can represent:

reasoning deltas
content deltas
function proposal events
completion event

Ensure streamed function proposal reaches client pendingFunction state, enabling /api/chat/v1/execute.
Ensure execute path sends preserved assistant state + tool result back to provider before continuation completion.
Keep OpenRouter behavior functional and backward-compatible.
Keep chat API gating (aiAssistant + EE checks) unchanged.
Keep tool_choice: "auto" for OpenRouter and Vertex.
Support server-driven turn-level thinking disable for Vertex when requested.
Document provider env/secrets contract in env examples.

Non-functional Requirements

No database migration required.
Provider selection defaults safely to OpenRouter when unspecified.
Streaming parser should tolerate unknown SSE event fields.
Maintain current security posture for function execution (approval + temporary API key).

Data / API / Integrations

New provider config inputs:

AI_CHAT_PROVIDER (openrouter | vertex)
OpenRouter: OPENROUTER_API_KEY, OPENROUTER_CHAT_MODEL
Vertex: GOOGLE_CLOUD_ACCESS_TOKEN (or equivalent secret), VERTEX_PROJECT_ID, VERTEX_LOCATION, VERTEX_CHAT_MODEL, optional VERTEX_OPENAPI_BASE_URL, optional thinking toggle

Chat message model (server/client)

Add optional reasoning_content on assistant messages.

Streaming event model

SSE events must carry typed payloads for reasoning, content, function proposals, and done.

Vertex request shaping

Use OpenAI-compatible chat completions endpoint.
Include reasoning_content on assistant turns when available.
Allow turn-level thinking override payload when configured.

Existing execution model

Keep API registry search + call_api_endpoint + approval handshake.

Security / Permissions

No change to permission boundaries for endpoint execution.
No bypass of manual approval for approval-required calls.
Continue issuing and revoking temporary API keys for approved execution.
Provider credentials must resolve via secret provider/env; never serialized to client.

Observability

Keep existing logs, add provider id in completion/stream logs for debugging.
No new observability system work in this scope.

Rollout / Migration

Deploy with default provider = OpenRouter.
Enable Vertex by environment configuration only.
Roll out Vertex first in non-production environments.
Keep immediate fallback: revert provider env to OpenRouter.

Open Questions

Token source lifecycle: should Vertex OAuth access token be externally refreshed and injected, or should server mint tokens from service account credentials in-process?
Thinking visibility policy: should reasoning UI be visible by default or hidden by default for end users?
Turn-level thinking control source: env-only for now, or request-level heuristic toggle from server logic?

Acceptance Criteria (Definition of Done)

With provider openrouter, chat and quick ask can again propose functions during streaming and execute approved calls end-to-end.
With provider vertex, chat and quick ask can stream reasoning + content, propose functions, execute approved calls, and continue with preserved reasoning context.
reasoning_content survives assistant -> tool -> assistant loops for Vertex without being dropped.
/api/chat/v1/completions/stream emits structured events sufficient for UI function proposal and continuation flow.
Existing /api/chat/v1/execute approval model remains intact.
No database migration required; existing message persistence remains functional.
Provider env configuration is documented in .env.example and ee/server/.env.example.
Test suite additions cover provider selection, Vertex request shape, streaming event parsing, function proposal path, and at least one DB-backed happy path plus one DB-backed guard/failure path.

7.8 KiB Raw Permalink Blame History

PRD — Vertex Preserved Thinking Chat Provider

Summary

Problem

Goals

Non-goals

Users and Primary Flows

UX / UI Notes

Requirements

Functional Requirements

Non-functional Requirements

Data / API / Integrations

Security / Permissions

Observability

Rollout / Migration

Open Questions

Acceptance Criteria (Definition of Done)

7.8 KiB

Raw Permalink Blame History