Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

649 lines
39 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# App-Wide Search — PRD
**Plan ID:** 2026-05-13-app-wide-search
**Owner:** Natallia Bukhtsik
**Status:** Draft (pending scope confirmation)
**Date:** 2026-05-13
---
## 1. Summary
Add a global full-text search bar to the MSP navigation sidebar that finds records across all major entities in the system — clients, contacts, team members, tickets, ticket comments, projects, project tasks and their comments, assets, invoices, contracts/quotes, documents, KB articles, the service catalog, service requests, workflow tasks, schedule entries, interactions, and time-entry notes. The search is powered by a single tenant-scoped index table (`app_search_index`) using PostgreSQL FTS (`tsvector`/GIN) with a `pg_trgm` fuzzy fallback. The index is kept in sync via event-bus-driven incremental updates, with a one-time backfill and a periodic reconciliation job. Per-entity authorization is enforced both via denormalized ACL columns on the index and via record-level checks before snippets are returned.
## 2. Problem Statement
Today there is no way to "just find a thing" in Alga PSA. To locate a record, users must know which area of the app it lives in, navigate to that area, and use entity-specific filters. Cross-entity discovery (e.g., "any record that mentions ACME") is impossible. This costs MSP technicians and dispatchers measurable time on every ticket triage, project lookup, and client phone-call workflow.
## 3. Goals
- **G1 — Single search input** in the MSP sidebar that returns relevant records across all major entity types in under ~300 ms p50 for a tenant of typical size.
- **G2 — Wide entity coverage.** A v1 user can find clients, contacts, team members, tickets, ticket comments, projects, project tasks, project task comments, assets, invoices (incl. line items + annotations), contracts (incl. quote drafts), documents, KB articles, service catalog items, service requests, workflow tasks, schedule entries, interactions, time-entry notes, boards, categories, and tags.
- **G3 — Tenant isolation.** No query ever returns a row from another tenant. Enforced at the schema (tenant in PK + distribution column) and query (mandatory `tenant = ?` predicate) layers.
- **G4 — Permission correctness.** Search never surfaces titles, snippets, or even the existence of records the user cannot read in the app proper. Internal ticket comments stay hidden from non-internal users. Private documents stay hidden.
- **G5 — Fresh-enough index.** Newly created/edited records appear in search within seconds (event-bus indexing latency). Worst-case staleness (during reconciliation lag) is bounded to under 24 hours.
- **G6 — Fuzzy matching.** Users can type partial tokens (`acm`), shortened IDs (`tic-10`), and minor misspellings (`exhcange`) and still get useful results.
- **G7 — Pluggable.** Adding a new entity type to the search index is a single registry entry plus an indexer module; no schema migrations required after v1.
## 4. Non-Goals
- **Client portal search.** Out of scope for v1. Will reuse the index and ACL infrastructure in v2.
- **Semantic / vector / embedding search.** Out of scope. The index table is designed so this can be added later as an additional column without restructuring.
- **Saved searches, search analytics dashboards, or admin tooling** for tuning relevance.
- **Cross-tenant federation** for support admins.
- **Search-driven bulk operations** (e.g., "find all tickets matching X and reassign").
- **Highlighting inside the destination page** beyond a deep-link anchor where one trivially exists (e.g., ticket comment hash).
- **Replacing entity-specific filters.** The existing `/api/v1/*/search` endpoints stay; this is additive.
## 5. Users & Primary Flows
### Personas
- **MSP Technician** — receives a phone call, needs to find a client and their open tickets in seconds.
- **MSP Dispatcher** — needs to find the right ticket/project/asset across hundreds of records to assign or schedule work.
- **MSP Manager** — needs to discover invoices, contracts, or KB articles by partial title or content.
### Primary flows
1. **Type-and-jump.** User clicks the sidebar search (or presses `Cmd/Ctrl+K`), types a few characters, sees grouped results in a dropdown, presses Enter or clicks → navigates directly to the canonical URL of the chosen record.
2. **Find a comment in a ticket.** User searches for a phrase they remember a tech mentioning. A ticket-comment result links to `/msp/tickets/{ticket_id}#comment-{comment_id}`. The destination page scrolls and highlights that comment briefly.
3. **Find a client by partial name.** User types `acm` — sees ACME Corp, ACME Holdings, and any tickets/contracts containing "acme" in their title.
4. **Find by ticket number.** User types `TIC-1023` — gets the ticket as the top result regardless of FTS tokenization.
5. **Empty result.** User searches for something that doesn't exist; sees a clear "no results" state with their query echoed back.
## 6. UX / UI Notes
The primary search surface is a dedicated **results page** at `/msp/search?q=...`. The sidebar input acts as a launcher with a minimal typeahead for the very top hits — it's not the surface users read through. Every search produces a real, shareable URL; every result row is a real anchor so `Cmd/Ctrl+click` opens in a new tab natively.
### 6.1 Sidebar launcher + typeahead
- **Location.** Persistent input at the top of the MSP sidebar (`server/src/components/layout/Sidebar.tsx`). Also openable via `Cmd/Ctrl+K`.
- **Typeahead behavior.** Debounced 200 ms; shows **up to 5 title-only suggestions** (no snippets, no group headers) ranked across all types. The last entry is always `→ See all N results` which navigates to `/msp/search?q=...`.
- **Keyboard.** ↑/↓ to navigate, Enter on a row opens that record, Enter on the input (or on "See all results") goes to the results page, Esc closes.
- **Rows are real `<a>` elements** with `href` pointing at the canonical URL — Cmd/Ctrl+click and middle-click open in new tabs without extra code.
- **Empty state.** Quiet — typeahead just doesn't render until ≥2 chars.
### 6.2 Results page (`/msp/search`)
- **URL is the state.** `?q=`, `?type=` (filter), `?cursor=` (pagination), `?sort=` (relevance | recent). Bookmarkable, shareable, back-button-safe.
- **Layout.** Filter chips across the top (`All`, then one per entity type with a count badge); grouped results below by entity type when `type=All`, flat list when a single type is selected.
- **Each result row** shows entity icon, title, subtitle (e.g., client name for a ticket), `ts_headline` snippet for body matches, and a relative `updated_at` tag.
- **Pagination.** Cursor-based "prev / next"; default 25 rows/page; "Load more" is acceptable as an alternate.
- **Per-row affordances.** Click to navigate, Cmd/Ctrl+click to open in new tab (native anchor behavior), right-click for browser context menu (also free with anchors).
- **Empty state.** Echoes the query, suggests broader filters or removing the type filter, links to entity-type filtered pages.
- **Loading state.** Skeleton rows; debounced server hits at 200 ms while the user types in the results-page input (mirrors the sidebar input).
### 6.3 Shared
- **i18n.** All UI strings go through `useTranslation('msp/core')` under a new `search.*` namespace.
- **Accessibility.** Stable kebab-case `id`s for the UI reflection system; ARIA `combobox` semantics on the sidebar input; ARIA `region` semantics on the results page; high-contrast snippet highlight; keyboard-only flows tested end to end.
- **Snippet HTML.** `ts_headline` HTML is sanitized at the server-action boundary (allow only `<mark>`); the React component renders via a trusted-string wrapper, never `dangerouslySetInnerHTML` of unsanitized output.
## 7. Entity Scope
All of the following are indexed at v1 launch. Adding/removing entities later is one registry entry + one indexer module.
| # | Entity | Source table(s) | Title field | Subtitle / body fields | Deep-link URL pattern | ACL strategy |
|---|---|---|---|---|---|---|
| 1 | Client | `clients` | `client_name` | `email`, `phone_no`, `notes` | `/msp/clients/{client_id}` | Tenant-wide (MSP) |
| 2 | Contact | `contacts` | `full_name` | `email`, `phone_number`, `role` | `/msp/contacts/{contact_name_id}` | Tenant-wide (MSP) |
| 3 | Team member | `users` (internal only) | `first_name last_name` | `username`, `email`, `title` | `/msp/team/{user_id}` | Tenant-wide (MSP); excludes client-type users |
| 4 | Ticket | `tickets` | `title` | `ticket_number`, client name (denormalized) | `/msp/tickets/{ticket_id}` | Ticket permission + board scope |
| 5 | Ticket comment | `comments` (parent: ticket) | parent ticket title | `note` (markdown → text) | `/msp/tickets/{ticket_id}#comment-{comment_id}` | Ticket permission + `is_internal` flag |
| 6 | Project | `projects` | `project_name` | `description` | `/msp/projects/{project_id}` | Project permission + client scope |
| 7 | Project phase | `project_phases` | `phase_name` | `description`, parent project name | `/msp/projects/{project_id}/phases/{phase_id}` | Inherits project ACL |
| 8 | Project task | `project_tasks` | `task_name` | `description` | `/msp/projects/{project_id}/tasks/{task_id}` | Inherits project ACL |
| 9 | Project task comment | `project_task_comments` | parent task name | `markdown_content` (preferred) or BlockNote→text | `/msp/projects/{project_id}/tasks/{task_id}#comment-{task_comment_id}` | Inherits project ACL |
| 10 | Asset | `assets` | `name` | `asset_tag`, `serial_number`, `location`, JSONB `attributes` → flattened | `/msp/assets/{asset_id}` | Tenant-wide; respects client-scope filters |
| 11 | Invoice | `invoices` | `invoice_number` | client name (denormalized), `total`, `status` | `/msp/invoices/{invoice_id}` | Invoice permission |
| 12 | Invoice line item | `invoice_items` | parent invoice number | `description` | `/msp/invoices/{invoice_id}#item-{item_id}` | Inherits invoice ACL |
| 13 | Invoice annotation | `invoice_annotations` | parent invoice number | `content` | `/msp/invoices/{invoice_id}#annotation-{annotation_id}` | Inherits invoice ACL |
| 14 | Contract | `contracts` | `contract_name` | `contract_description`; subtitle="Quote" when `status='draft'` else "Contract" (statuses: `active`, `draft`, `terminated`, `expired`) | `/msp/billing/contracts/{contract_id}` | Contract permission |
| 15 | Client contract | `client_contracts` (joins `contracts` + `clients`) | derived: `{client_name} {contract_name}` | dates + status | `/msp/clients/{client_id}/contracts/{client_contract_id}` | Inherits client + contract ACL via `client_scope_id` |
| 16 | Document | `documents` (`content` column holds BlockNote JSON) | `document_name` | `content` (BlockNote→text, truncated to 64 KB) | `/msp/documents/{document_id}` | Document permission; tenant-wide visibility at v1 (no internal share mechanism exists); optional `client_scope_id` from `documents.client_id` |
| 17 | KB article | `kb_articles` (FK to `documents`) | document name | document content (BlockNote→text) | `/msp/knowledge-base/{article_id}` | KB read permission |
| 18 | Service catalog item | `service_catalog` | `service_name` | `description`, JSONB `attributes` → flattened | `/msp/billing/services/{service_id}` | Service catalog permission |
| 19 | Service request submission | `service_request_submissions` | `request_name` | `submitted_payload` JSONB → flattened strings | `/msp/service-requests/{submission_id}` | Service request permission |
| 20 | Service request definition | `service_request_definitions` | `name` | `description` | `/msp/service-requests/definitions/{definition_id}` | Admin permission |
| 21 | Workflow task | `workflow_tasks` (note: PK is `task_id` alone, not `(tenant, task_id)`; `tenant` is a regular column) | `title` | `description` | `/msp/workflow-tasks/{task_id}` | Workflow task permission + assignee scope via `assigned_users` jsonb |
| 22 | Interaction | `interactions` | `title` | interaction type name + counterparty (client/contact/ticket); `notes` (BlockNote→text) as body | `/msp/interactions/{interaction_id}` | Interaction permission |
| 23 | Schedule entry | `schedule_entries` | `title` | `notes` | `/msp/schedule/{entry_id}` | Schedule permission + assignee scope |
| 24 | Time entry | `time_entries` | derived (work item + date) | `notes` | links to parent work item | Time-entry permission + owner scope |
| 25 | Board | `boards` | `channel_name` | — | `/msp/tickets?board={board_id}` | Tenant-wide |
| 26 | Category | `categories` | `category_name` | — | `/msp/tickets?category={category_id}` | Tenant-wide |
| 27 | Tag | `tags` | `tag_text` | — | filter link | Tenant-wide |
**Note on quotes.** The codebase has no separate `quotes` table; quotes are draft `contracts`. The indexer flags `status = 'draft'` contracts with `subtitle = "Quote"` so users searching "quote" get sensible results.
## 8. Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────┐
│ MSP UI (Next.js) │
│ Sidebar.tsx ──► SearchPalette (cmdk) ──► searchAppAction() │
└─────────────────────────────────────────────────────────────────────┘
│ withAuth server action
┌─────────────────────────────────────────────────────────────────────┐
│ Search service (server/src/lib/search/) │
│ • buildQuery(query, allowedTypes, user) │
│ • runQuery(knex, …) → FTS + pg_trgm fallback │
│ • applyAcl(user, rows) → record-level final filter (defence-in- │
│ depth on top of denormalized ACL columns) │
│ • formatResults() → snippets via ts_headline │
└─────────────────────────────────────────────────────────────────────┘
│ knex
┌─────────────────────────────────────────────────────────────────────┐
│ Postgres / Citus │
│ app_search_index (distributed by tenant) │
│ • GIN(search_vector) │
│ • GIN(title gin_trgm_ops), GIN(subtitle gin_trgm_ops) │
└─────────────────────────────────────────────────────────────────────┘
│ upsert / delete
┌─────────────────────────────────────────────────────────────────────┐
│ Indexer framework (server/src/lib/search/indexers/) │
│ Registry: { client, contact, ticket, ticket_comment, … } │
│ Each entry implements: │
│ • async loadOne(tenant, id) → SearchDoc | null │
│ • async loadBatch(tenant, cursor, limit) → SearchDoc[] │
│ • sourceEvents: EventType[] │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────┼──────────────────┐
│ │ │
┌──────────┐ ┌──────────────┐ ┌──────────────────┐
│ Event │ │ Backfill │ │ Reconciliation │
│ bus │ │ CLI / job │ │ pg-boss cron │
│ sub │ │ │ │ (daily diff) │
└──────────┘ └──────────────┘ └──────────────────┘
│ event-bus events
┌──────────┐
│ Existing │ TICKET_CREATED, TICKET_UPDATED, TICKET_COMMENT_ADDED,
│ publish │ + new: CLIENT_*, CONTACT_*, PROJECT_*, ASSET_*,
│ sites │ INVOICE_*, CONTRACT_*, DOCUMENT_*, USER_*, etc.
└──────────┘
```
## 9. Data Model
### 9.1 `app_search_index` table
```sql
CREATE TABLE app_search_index (
tenant uuid NOT NULL,
object_type text NOT NULL, -- 'client' | 'ticket' | ...
object_id text NOT NULL, -- text to accommodate composite ids
parent_type text,
parent_id text,
title text NOT NULL,
subtitle text,
body text, -- capped at 64 KB
url text NOT NULL,
metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
-- Denormalized ACL hints
visible_to_user_ids uuid[] NOT NULL DEFAULT '{}', -- empty = no per-user restriction
visible_to_roles text[] NOT NULL DEFAULT '{}', -- empty = no role restriction
is_internal_only boolean NOT NULL DEFAULT false, -- e.g., internal ticket comments
is_private boolean NOT NULL DEFAULT false, -- e.g., private documents
client_scope_id uuid, -- if set, only users with access to this client can see
required_permission text, -- e.g., 'ticket:read'
-- Search columns
search_vector tsvector NOT NULL,
search_lang text NOT NULL DEFAULT 'english',
-- Bookkeeping
source_updated_at timestamptz NOT NULL, -- the source row's updated_at, used by reconciliation
indexed_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (tenant, object_type, object_id)
);
-- Citus
SELECT create_distributed_table('app_search_index', 'tenant');
-- Indexes
CREATE INDEX app_search_index_vector_gin
ON app_search_index USING gin (search_vector);
CREATE INDEX app_search_index_title_trgm
ON app_search_index USING gin (title gin_trgm_ops);
CREATE INDEX app_search_index_subtitle_trgm
ON app_search_index USING gin (subtitle gin_trgm_ops);
CREATE INDEX app_search_index_recent
ON app_search_index (tenant, source_updated_at DESC);
CREATE INDEX app_search_index_type
ON app_search_index (tenant, object_type);
```
### 9.2 `SearchDoc` interface
The indexer-side representation, converted to a row at insert time. Defined in `server/src/lib/search/types.ts`.
```typescript
export interface SearchDoc {
tenant: string;
objectType: SearchObjectType;
objectId: string;
parentType?: SearchObjectType;
parentId?: string;
title: string;
subtitle?: string;
body?: string;
url: string;
metadata?: Record<string, unknown>;
acl: {
visibleToUserIds?: string[];
visibleToRoles?: string[];
isInternalOnly?: boolean;
isPrivate?: boolean;
clientScopeId?: string;
requiredPermission?: string;
};
sourceUpdatedAt: Date;
}
```
### 9.3 Body normalization
A new utility, `server/src/lib/search/normalize.ts`:
- `flattenBlockNote(json: unknown): string` — walks BlockNote JSON, concatenating text nodes, dropping image data URIs.
- `flattenMarkdown(md: string): string` — strips markdown formatting tokens.
- `flattenJsonbPayload(obj: unknown): string` — recursively pulls string leaves out of a JSONB blob (used for service-request submissions and asset attributes).
- `truncateForIndex(text: string, maxBytes = 65_536): string` — UTF-8-safe byte cap.
The existing `public.process_large_lexemes()` Postgres function is reused as the final cleanser of the body string before `to_tsvector` is computed in the indexer.
## 10. Indexer Framework
Located at `server/src/lib/search/indexers/`. One file per entity type:
```
server/src/lib/search/
index.ts (registry)
types.ts (SearchDoc, SearchObjectType union)
normalize.ts
upsert.ts (knex upsert into app_search_index)
query.ts (FTS + pg_trgm)
acl.ts (record-level filter helpers)
ts_headline.ts (snippet builder)
indexers/
client.ts
contact.ts
user.ts
ticket.ts
ticket_comment.ts
project.ts
project_phase.ts
project_task.ts
project_task_comment.ts
asset.ts
invoice.ts
invoice_item.ts
invoice_annotation.ts
contract.ts
client_contract.ts
document.ts
kb_article.ts
service_catalog.ts
service_request_submission.ts
service_request_definition.ts
workflow_task.ts
interaction.ts
schedule_entry.ts
time_entry.ts
board.ts
category.ts
tag.ts
```
Each indexer exports:
```typescript
export const clientIndexer: EntityIndexer = {
objectType: 'client',
sourceEvents: ['CLIENT_CREATED', 'CLIENT_UPDATED', 'CLIENT_DELETED'],
loadOne: async (knex, tenant, id) => { /* … */ return doc | null },
loadBatch: async (knex, tenant, cursor, limit) => { /* … */ return docs[] },
};
```
The registry exports `getIndexer(objectType)` and `allIndexers()`.
## 11. Indexing Pipeline
### 11.1 Event-driven (real-time)
A new subscriber at `server/src/lib/eventBus/subscribers/searchIndexSubscriber.ts` listens to all configured `sourceEvents`. On each event it:
1. Looks up the indexer via the registry.
2. For `*_DELETED` events, deletes the row from `app_search_index`.
3. For `*_CREATED` / `*_UPDATED` events, calls `indexer.loadOne()` and upserts.
**Cascade events.** Some entities depend on others:
- A ticket update re-indexes the ticket. It also touches all ticket comments because their `parent.title` (denormalized) may have changed → handled by the ticket indexer dispatching `TICKET_COMMENT_REINDEX` for each comment.
- An invoice update re-indexes line items + annotations.
- A project update re-indexes phases + tasks (capped, async).
### 11.2 Gap: new events
The recon shows event-bus emission exists for ticket/comment events but NOT for clients/contacts/projects/assets/invoices/contracts/documents. **The plan adds the missing publishes** (one feature per entity family) at the existing action call sites. Use the schema in `server/src/lib/eventBus/events.ts`.
### 11.3 Backfill (one-time)
A new CLI script `server/src/scripts/search-backfill.ts` (also runnable via npm script `npm run search:backfill`) that:
1. Iterates all tenants (or one specific tenant via flag).
2. For each indexer, pages through `loadBatch()` in chunks of 500 and upserts.
3. Logs progress to stdout and a row count summary per entity type per tenant.
4. Is idempotent — re-running upserts overwrite.
### 11.4 Reconciliation (periodic)
A `pg-boss` job `search:reconcile` scheduled daily:
1. For each tenant and each entity type, SELECT rows from the source table where `updated_at > (max(source_updated_at) for that type)`.
2. Re-index those rows.
3. Also: SELECT source IDs missing from the index → re-index. SELECT index IDs missing from source → delete.
This catches drift from dropped events, manual SQL changes, or bugs.
### 11.5 ACL re-index triggers
Permission/assignment changes need to refresh denormalized ACL columns:
- Ticket reassignment → re-index the ticket and its comments.
- Document permission change → re-index the document.
- User role change → background job re-indexes affected rows for that user's `visible_to_user_ids` membership. (Acceptable to run async; max staleness ~10 min via a pg-boss job triggered by user-update events.)
## 12. Search Server Action
New action at `server/src/lib/actions/searchActions.ts`:
```typescript
export const searchAppAction = withAuth(async (
user,
{ tenant },
input: SearchAppInput
): Promise<SearchAppResult> => {
const { query, types, limit = 30, cursor } = input;
// 1. Trim/validate query
// 2. Determine allowedTypes = types ∩ entitiesUserCanRead(user)
// 3. Build SQL: WHERE tenant = ? AND object_type = ANY(?) AND acl_filter(user)
// ORDER BY rank DESC, source_updated_at DESC
// LIMIT (limit + ACL_OVERFETCH_BUFFER) OFFSET cursor
// 4. Run record-level ACL pass for defence-in-depth
// 5. Format snippets with ts_headline
// 6. Return grouped results
});
```
### Input
```typescript
interface SearchAppInput {
query: string;
types?: SearchObjectType[]; // omit = all the user can see
limit?: number; // default 30, max 100
cursor?: string; // opaque, encodes (rank, object_id)
}
```
### Output
```typescript
interface SearchResultRow {
type: SearchObjectType;
id: string;
parentId?: string;
title: string;
subtitle?: string;
snippet?: string; // ts_headline output, sanitized to allow only <mark>
url: string;
score: number;
updatedAt: string; // ISO
}
interface SearchAppResult {
results: SearchResultRow[];
groups: Record<SearchObjectType, number>; // counts by type, before pagination
totalCount: number;
nextCursor?: string;
}
```
### Variants
- `searchAppAction(input)` — full results, used by the results page.
- `searchAppTypeaheadAction(input)` — top-5 title-only suggestions for the sidebar, optimised for sub-100 ms p50. Skips snippet generation and grouping. Reuses the same query builder under the hood, with `limit=5` and `snippet=false`.
## 13. Ranking & Snippets
### 13.1 SQL shape (simplified)
```sql
WITH q AS (
SELECT websearch_to_tsquery('english', $query) AS tsq,
$query AS raw
)
SELECT
s.object_type, s.object_id, s.parent_id, s.title, s.subtitle, s.url,
s.source_updated_at,
ts_rank_cd(s.search_vector, q.tsq) AS fts_rank,
GREATEST(
similarity(s.title, q.raw),
similarity(s.subtitle, q.raw)
) AS trgm_rank,
ts_headline('english', s.body, q.tsq,
'MaxFragments=2,StartSel=<mark>,StopSel=</mark>') AS snippet
FROM app_search_index s, q
WHERE s.tenant = $tenant
AND s.object_type = ANY($allowedTypes)
AND acl_predicate(s, $user)
AND (
s.search_vector @@ q.tsq
OR s.title % q.raw -- pg_trgm fuzzy
OR s.subtitle % q.raw
)
ORDER BY
-- Weighted composite: prefer FTS hits, then trigram, then recency
(fts_rank * 1.0 + trgm_rank * 0.4)
* exp(-EXTRACT(epoch FROM (now() - source_updated_at)) / (90 * 86400))
DESC,
s.source_updated_at DESC,
s.object_id
LIMIT $limit OFFSET $offset;
```
### 13.2 Field weighting
Per indexer, body strings are composed before `to_tsvector` is computed:
```typescript
search_vector =
setweight(to_tsvector('english', title ), 'A') -- highest
|| setweight(to_tsvector('english', subtitle), 'B')
|| setweight(to_tsvector('english', body ), 'C');
```
### 13.3 Identifier matches
Short identifier matches (ticket numbers, invoice numbers, asset tags) are handled by a separate exact-match branch: if `query` matches `^[A-Z]+-?\d+$` it is also probed against denormalized `metadata->>'identifier'` and pinned at the top.
### 13.4 Time decay
Multiplier `exp(-age_days / 90)` floor 0.05. Tunable per object type later via metadata; not configurable in v1.
## 14. Permissions & ACL
### 14.1 Two-layer filter
1. **SQL-level filter via denormalized columns** (`is_internal_only`, `is_private`, `visible_to_user_ids`, `visible_to_roles`, `client_scope_id`, `required_permission`). This is the load-bearing layer for pagination correctness.
2. **Record-level final pass in app code.** A `verifyResultVisibility(user, rows)` call runs each row through the same permission helpers used by the entity's primary actions (e.g., `assertTicketReadable`). Mismatches are dropped and logged (telemetry for index drift, not user-facing).
### 14.2 ACL strategy per entity (v1)
| Entity | SQL filter | Record-level final check |
|---|---|---|
| Client | `required_permission='client:read'` checked via `hasPermission` | none extra |
| Contact | `required_permission='contact:read'` | none extra |
| Team member | `required_permission='user:read'` | none extra |
| Ticket | `required_permission='ticket:read'` AND board scope via `visible_to_roles` | `assertTicketReadable(user, ticket_id)` |
| Ticket comment | inherits ticket ACL; if `is_internal_only=true`, requires internal user | parent ticket check + comment-internal check |
| Project | `required_permission='project:read'` AND `client_scope_id` | `assertProjectReadable` |
| Project phase/task/task-comment | inherits project ACL | parent project check |
| Asset | `required_permission='asset:read'` AND optional `client_scope_id` | none extra |
| Invoice (+ items, annotations) | `required_permission='invoice:read'` AND `client_scope_id` | none extra |
| Contract (+ client_contract) | `required_permission='contract:read'` AND optional `client_scope_id` | none extra |
| Document | `required_permission='document:read'`; optional `client_scope_id` derived from `documents.client_id` when set (no internal share-list mechanism exists in CE — documents are otherwise tenant-wide) | none extra |
| KB article | `required_permission='kb:read'` | none extra |
| Service catalog | `required_permission='service_catalog:read'` | none extra |
| Service request submission | `required_permission='service_request:read'` AND optional `client_scope_id` | none extra |
| Service request definition | `required_permission='admin'` | none extra |
| Workflow task | `required_permission='workflow_task:read'`; `visible_to_user_ids` for assignee scope | none extra |
| Interaction | `required_permission='interaction:read'` | none extra |
| Schedule entry | `required_permission='schedule:read'` AND `visible_to_user_ids` for owner scope | none extra |
| Time entry | `required_permission='time:read'`; `visible_to_user_ids` for owner scope | none extra |
| Board / Category / Tag | `required_permission='ticket:read'` | none extra |
### 14.3 Defence-in-depth fail-safe
If the SQL filter and the record-level check disagree, the row is dropped AND a telemetry counter (server log + Sentry) increments `search.acl_drift`. Drift > 0 is treated as a bug.
## 15. Multi-Tenancy & Citus Considerations
- `app_search_index` is **distributed by `tenant`**. All inserts, updates, deletes go via tenant-included WHERE clauses.
- The indexer framework requires the caller to pass `tenant` explicitly; nothing relies on `app.current_tenant` GUC.
- No joins across tables in the search query path; the index is fully denormalized. This avoids cross-shard join concerns.
- The reconciliation job operates one tenant at a time.
## 16. Rollout & Migration
- **Single release.** Ships in CE main branch; EE inherits automatically. No PostHog flag.
- **Backfill order on deploy.**
1. Apply migration adding `app_search_index` table + indexes + `pg_trgm` extension if not present.
2. Deploy code with subscriber **disabled** by env var `SEARCH_INDEX_LIVE=false`.
3. Run `npm run search:backfill` against production for each tenant.
4. Flip `SEARCH_INDEX_LIVE=true` and roll the workers; subscriber begins indexing.
5. Enable the sidebar UI via merge to main.
- **Reconciliation job** is enabled day-1; first run catches anything missed during the backfill ↔ live-flip window.
- **No DB downtime required.** Index build can run `CONCURRENTLY` after table creation.
## 17. Acceptance Criteria / Definition of Done
A user can do all of the following from a clean prod-like environment with a seeded multi-tenant dataset:
1. Open the sidebar input, type `acme`, see ACME Corp as the top typeahead suggestion within 500 ms.
2. Press Enter from the sidebar input → land on `/msp/search?q=acme` with full grouped results.
3. Cmd/Ctrl+click any result row on the results page → opens in a new tab; original page state is preserved.
4. Type `TIC-1023` (or partial `tic-10`) and find the matching ticket as the top result.
5. Type a phrase known to appear only in a single ticket comment; click the result and land on the ticket page with the comment highlighted.
6. Type a phrase known to appear only in an internal ticket comment as an internal user — the comment is in the results.
7. Repeat (6) as a non-internal user — the comment is NOT in the results.
8. As user A, search for a project user A has no access to — the project does NOT appear.
9. As user A (client-scoped, no access to client X), search for a document whose `documents.client_id = X` — the document does NOT appear in results.
10. Type a misspelled client name (`exhcange` for `Exchange`) and still get the right top result via pg_trgm fallback.
11. Create a new ticket; within 5 seconds it appears in search.
12. Delete a record; within 5 seconds it disappears from search.
13. Run `npm run search:backfill` on a fresh DB and verify a sampled record from each of the 27 entity types is searchable.
14. Run the reconciliation job manually after deleting an index row directly via SQL → the row reappears.
15. Search across multiple tenants in a load test; no cross-tenant leak in 1M queries.
16. All search UI strings render correctly in `pseudo` locale (`xx`) — confirming i18n coverage.
17. Sidebar typeahead AND results page are fully keyboard-navigable (Tab, arrows, Enter, Esc).
18. The results-page URL with `?q=`, `?type=`, `?cursor=`, `?sort=` is bookmarkable: opening it cold renders the same results without the user re-typing.
## 18. Risks & Open Questions
### Risks
- **R1 — Permission leakage.** Mitigated by two-layer ACL, exhaustive permission tests per entity (one positive + one negative test minimum), and drift telemetry.
- **R2 — Index size.** With ~27 entity types and documents/comment bodies, the index could be large for big tenants. Mitigations: 64 KB body cap, `process_large_lexemes` stripping, GIN compression. Budget: target index ≤ 20% of total DB size on a typical tenant.
- **R3 — Event-bus reliability.** If events drop, search goes stale. Mitigated by daily reconciliation job and the `source_updated_at` watermark.
- **R4 — Backfill duration on big tenants.** Single-tenant backfill could take hours. Mitigation: paged loads + parallel per-type workers; backfill runs offline before subscriber goes live.
- **R5 — Ranking quality.** Time-decay constants are guesses. Plan: ship the formula, add a `metadata.boost_score` per object type later when we have telemetry. Out of scope for v1.
- **R6 — BlockNote schema changes.** If document format changes, the flattener breaks. Mitigation: snapshot tests on real BlockNote payloads + version detection.
### Open questions
All resolved 2026-05-13:
- **Q1 — Snippet HTML sanitization → Server-side rebuild.** The query path emits `ts_headline` with controlled `<mark>` sentinels; the search service splits on sentinels, HTML-escapes each text segment, re-wraps match segments in `<mark>`, returns a string the client can render via a trusted-string component. No DOMPurify dependency on the client.
- **Q2 — Query length cap → 200 chars.** Enforced in Zod input schema.
- **Q3 — `time_entries.notes` → Index only when non-empty** (`notes IS NOT NULL AND notes <> ''`). No minimum-length threshold.
- **Q4 — Board/Category/Tag → Result rows, not filter chips.** Surfaced as normal results.
- **Q5 — `interactions` → Uses current schema.** Per migration `20250530000000_improve_interactions_schema.cjs`, `description` was renamed to `title` and a new `notes` text column was added. `notes` stores BlockNote JSON. Indexer: title=`title`, body=`flattenBlockNote(notes)` truncated to 64 KB, subtitle=interaction type name + counterparty.
---
## 19. CE / EE Extension Mechanism
The entire search subsystem ships in CE. EE never forks any of these files; it extends via a single hook.
### 19.1 Registry merge
`server/src/lib/search/index.ts` imports two indexer arrays and merges them:
```typescript
import { ceIndexers } from './indexers'; // CE indexers (27 entries at v1)
import { eeIndexers } from 'ee/server/src/lib/search/indexers'; // STUBBED in CE (returns [])
const registry = new Map<string, EntityIndexer>(
[...ceIndexers, ...eeIndexers].map(i => [i.objectType, i])
);
export function getIndexer(objectType: string): EntityIndexer | undefined {
return registry.get(objectType);
}
export function allIndexers(): EntityIndexer[] {
return [...registry.values()];
}
export function registeredObjectTypes(): string[] {
return [...registry.keys()];
}
```
In CE builds, the import path `ee/server/src/lib/search/indexers` resolves to a stub file (existing CE/EE stub pattern in the repo) that exports `export const eeIndexers: EntityIndexer[] = []`. In EE builds, it resolves to the real EE module with EE-specific indexers (e.g., chat history, AI conversations, extension content).
### 19.2 Schema is shared
`app_search_index.object_type` is a plain `text` column, **not an enum**. CE and EE share the same table. CE-only deploys simply have no rows where `object_type` is an EE-only type.
### 19.3 Event subscriber is registry-driven
`searchIndexSubscriber` subscribes to the union of `sourceEvents` from `allIndexers()`. In CE, that union excludes EE event types. In EE, the same code (no fork) subscribes to additional event families because EE's indexers brought them in.
### 19.4 UI is data-driven
The results page filter chips and group headers iterate over `registeredObjectTypes()` so EE entity types appear automatically in EE builds without any UI code change. i18n keys for EE entity labels live in the EE locale namespace and are looked up by `search.filters.{objectType}` / `search.groups.{objectType}` — CE renders a fallback (humanized object_type) if the key is missing in CE.
### 19.5 Orphan safety on edition transition
If a deploy transitions from EE to CE (or an EE feature is removed), index rows whose `object_type` no longer has a registered indexer become invisible: the query layer always restricts `object_type = ANY(registeredObjectTypes())`. The orphans sit harmlessly in the table. Reconciliation skips object_types without registered indexers — it does NOT attempt to load the source row, because it has no indexer to do so. A maintenance script (out of v1 scope) can purge these rows on demand.
### 19.6 What EE must implement
For each EE-only entity that should be searchable, EE provides:
1. An entry in `ee/server/src/lib/search/indexers/index.ts` (`export const eeIndexers: EntityIndexer[]`).
2. A per-entity indexer module under `ee/server/src/lib/search/indexers/<entity>.ts`, following the same `EntityIndexer` interface as CE indexers.
3. Event types in `packages/event-schemas` (or wherever EE event extensions live) with Zod payload schemas, plus publishes at the relevant EE action sites.
4. i18n entries for the entity's filter/group label in the EE locale namespace.
EE does **not** touch the CE registry file, the subscriber, the search action, the query builder, or the UI.
---
**End of PRD.**