Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
19 KiB
App-Wide Search System
App-wide search is a federated, full-text search over every major entity in the
product (tickets, comments, clients, contacts, documents, projects, invoices,
assets, KB articles, etc.). It is backed by a single denormalized PostgreSQL
table, app_search_index, kept in sync by an event-driven subscriber with a
nightly reconcile sweep and an on-demand backfill script.
This document describes the architecture, data model, query semantics, access
control, configuration, and operations. For first-time environment rollout
steps, see the deploy runbook: docs/deployment/app-wide-search-runbook.md.
Core Features
Federated search
A single query returns ranked results across all registered object types. The
results page (/msp/search) groups hits by type with per-type counts; the
sidebar palette shows a typeahead dropdown plus a "See all N results" link.
Typeahead
searchAppTypeaheadAction returns the top 5 rows (no snippets) for the sidebar
palette. It also returns an accurate totalCount computed by a separate count
query so the "See all N results" link reflects the true match count rather than
the 5-row display cap.
Partial / prefix matching
Queries match three ways, OR-combined:
- Full-text —
websearch_to_tsquery('english', q)againstsearch_vector(stemmed lexeme match, supports phrases/operators). - Prefix — the query is tokenized, sanitized, and turned into a
to_tsquery('english', 'tok1:* & tok2:*')sowondmatchesWonderland. - Substring / fuzzy —
title/subtitleILIKE '%q%'pluspg_trgmsimilarity (%) for typo tolerance.
Identifier search
Queries shaped like TIC-1023 or LAP0042 are detected by regex. Exact
identifier matches are pinned with score 1000; prefix identifier matches with
900. The identifier is read from metadata->>'identifier'.
Relevance ranking
Non-identifier rows are scored by:
( ts_rank_cd(search_vector, tsq)
+ ts_rank_cd(search_vector, prefix_tsq) * 0.7 -- when prefix query present
+ GREATEST(similarity(title, q), similarity(subtitle, q)) * 0.4
) * GREATEST( exp(-age_in_seconds / (90 days)), 0.05 ) -- recency decay, floored
Sort modes: relevance (score desc) or recent (source_updated_at desc).
Both use keyset pagination on (score, source_updated_at, object_id) or
(source_updated_at, object_id).
Architecture
Components
| Path | Responsibility |
|---|---|
server/src/lib/search/types.ts |
EntityIndexer, SearchDoc, SearchObjectType contracts |
server/src/lib/search/index.ts |
Indexer registry (allIndexers, getIndexer, registeredObjectTypes) — merges CE + EE |
server/src/lib/search/indexers/* |
One indexer per object type (CE set) |
packages/ee/src/lib/search/indexers/index.ts |
EE indexer set (currently empty stub in CE builds) |
server/src/lib/search/upsert.ts |
upsertSearchDoc / deleteSearchDoc |
server/src/lib/search/sql.ts |
buildTsvectorSql — weighted tsvector builder |
server/src/lib/search/query.ts |
runSearchQuery, runSearchTypeaheadQuery, countSearchMatches, countSearchMatchesByType, cursor + headline helpers |
server/src/lib/search/acl.ts |
ACL principal resolution, SQL predicate, per-row visibility verifiers |
server/src/lib/actions/searchActions.ts |
'use server' actions: searchAppAction, searchAppTypeaheadAction |
server/src/lib/actions/searchActionShared.ts |
Schemas, types, SearchRateLimitError (non-async exports kept out of the 'use server' file) |
server/src/lib/eventBus/subscribers/searchIndexSubscriber.ts |
Live indexing from entity events |
server/src/lib/jobs/handlers/searchReconcileHandler.ts |
Nightly reconcile sweep |
server/src/lib/jobs/handlers/searchVisibleUserReindexHandler.ts |
Re-index rows affected by a user's permission change |
server/src/scripts/search-backfill.ts |
One-shot backfill CLI (npm run search:backfill) |
server/src/components/search/SearchPalette.tsx |
Sidebar typeahead UI |
server/src/app/msp/search/ |
Results page (page.tsx + SearchPageClient.tsx) |
Indexing pipeline
There are three write paths into app_search_index:
- Live (event-driven) —
searchIndexSubscribersubscribes to entity events (e.g.TICKET_COMMENT_ADDED,CLIENT_UPDATED,DOCUMENT_DELETED). The event bus is Redis Streams based, so indexing runs off the request hot path. Gated bySEARCH_INDEX_LIVE(see Configuration). Delete events calldeleteSearchDoc; all others callupsertSearchDoc. - Reconcile (scheduled) —
search:reconcilepg-boss job, cron0 6 * * *per tenant. Re-indexes rows newer than the indexed watermark, deletes index rows whose source row is gone, and inserts source rows missing from the index. - Backfill (manual) —
npm run search:backfillwalks every indexer for every (or one) tenant and upserts in batches of 500. Used for first-time population and after indexer changes.
Live indexing covers new writes only. Rows that predate the feature (or predate an indexer fix) require a reconcile pass or a backfill.
Database Schema
app_search_index
Migration: server/migrations/20260513120000_create_app_search_index.cjs.
Primary key (tenant, object_type, object_id). On Citus, distributed by
tenant (co-located writes).
| Column | Type | Notes |
|---|---|---|
tenant |
uuid | Tenant scope (always in WHERE) |
object_type |
text | e.g. ticket, document |
object_id |
text | Source row id |
parent_type / parent_id |
text | e.g. a ticket_comment's parent ticket |
title |
text | Weight A in search_vector |
subtitle |
text | Weight B |
body |
text | Weight C (flattened content) |
url |
text | Result deep link |
metadata |
jsonb | identifier lives here for identifier search |
visible_to_user_ids |
uuid[] | ACL hint |
visible_to_roles |
text[] | ACL hint |
is_internal_only |
boolean | ACL hint (e.g. internal comments) |
is_private |
boolean | ACL hint |
client_scope_id |
uuid | ACL hint |
required_permission |
text | e.g. ticket:read |
search_vector |
tsvector | Weighted, generated via process_large_lexemes |
search_lang |
text | english |
source_updated_at |
timestamptz | Recency ranking + reconcile watermark |
indexed_at |
timestamptz | Last index write |
Indexes: GIN on search_vector; GIN trigram on title and subtitle;
btree (tenant, source_updated_at DESC); btree (tenant, object_type).
search_vector construction
buildTsvectorSql produces:
setweight(public.process_large_lexemes(title), 'A')
|| setweight(public.process_large_lexemes(subtitle),'B')
|| setweight(public.process_large_lexemes(body), 'C')
public.process_large_lexemes(text) RETURNS tsvector already calls
to_tsvector('english', …) internally after stripping base64 image data URIs,
dropping oversized lexemes, and capping input at 500 KB. Do not wrap its
result in to_tsvector again.
Indexers
Indexer contract
interface EntityIndexer {
objectType: SearchObjectType;
sourceEvents: string[]; // event-bus triggers
loadOne(knex, tenant, id): Promise<SearchDoc | null>;
loadBatch(knex, tenant, cursor, limit): Promise<SearchDoc[]>;
}
loadOne powers live indexing and reconcile spot-checks; loadBatch powers
backfill and reconcile sweeps (keyset paginated by the source id).
Registered indexers (CE)
28 object types: asset, board, category, client, client_contract,
contact, contract, document, interaction, invoice, invoice_item,
invoice_annotation, kb_article, service_catalog,
service_request_definition, service_request_submission, status, tag,
user, ticket, ticket_comment, project, project_phase, project_task,
project_task_comment, schedule_entry, time_entry, workflow_task.
EE indexers merge in via @ee/lib/search/indexers (empty in CE builds).
Content extraction
Body text is normalized before indexing (server/src/lib/search/normalize.ts):
flattenBlockNote(json)— walks BlockNote JSON extractingtext/content, strips base64 image URIs; also accepts a plain string (returns it normalized).flattenMarkdown(md)— strips markdown syntax to plain text.flattenJsonbPayload(obj)— flattens arbitrary jsonb, skipping secret-like keys.truncateForIndex(text, 65536)— byte cap on indexed body.
Content lives in different places per entity, so indexers must read the right source:
- Documents — content can live in
documents.content,document_content.content(plain text side table), ordocument_block_content.block_data(BlockNote jsonb). The document indexer joins all three and concatenates. - Ticket comments —
comments.notemay be plain text or BlockNote JSON; the readable text is mirrored incomments.markdown_content. The indexer prefersmarkdown_content(viaflattenMarkdown), else usesflattenBlockNotefor JSON notes /flattenMarkdownfor plain notes. - Tags —
tag_definitionsrows share the sametag_textacross differenttagged_types (e.g. a "Urgent" ticket tag vs. a "Urgent" project-task tag). The indexer setssubtitleto a humanizedtagged_typelabel (e.g. "Ticket tag", "Project task tag") so otherwise-identical tag titles are distinguishable in results.tagged_typeis also kept inmetadata. - Status — only ticket statuses are indexed (project / project_task /
interaction statuses have no global "all items in status X" destination, so
they were dead-end results and are excluded). Ticket statuses are
board-scoped, so the same name (e.g. "Open") exists once per board; the
indexer dedupes by name (
DISTINCT ON (name), one row per distinct name,object_id = name) to mirror the ticketing dashboard, which groups its status filter by name. The result links to the dashboard's own name-based filter value (/msp/tickets?statusId=__status_name__:<encodedName>, built viaURLSearchParams— noboardId); the prefix constant is mirrored frompackages/tickets/src/lib/ticketStatusFilter.ts. Becauseobject_idis the name,loadOneresolves bystatus_idorname(live-event path vs. reconcile's delete-sweep), and thestatusvisibility verifier matches onname+status_type='ticket'.
Indexer column lists must match the live schema. Several columns differ from intuition: tickets use
entered_at(notcreated_at);clientshasbilling_email(nophone_no);contactshas nophone_number;usershas norolecolumn (roles are a join table);documentshas noclient_id(associations live indocument_associations).
Query Semantics
runSearchQuery builds a single CTE query:
qCTE computestsq(websearch),prefix_tsq(nullable prefix tsquery),raw, andidentifier.rankedCTE applies the match predicate, computes score, and (for full search) a sanitizedts_headlinesnippet.- Outer query applies the keyset cursor predicate, ORDER BY, LIMIT/OFFSET.
countSearchMatches / countSearchMatchesByType run the same match predicate
without ranking/limit to produce accurate totals and per-type group counts.
These intentionally skip the per-row visibility verifier (the SQL-level ACL
predicate already enforces the bulk of access control), so counts may slightly
over-count in rare ACL-drift cases — an acceptable trade for accurate totals.
Snippet safety
ts_headline is configured with module-constant sentinels
(__SEARCH_MARK_START__ / __SEARCH_MARK_STOP__). sanitizeHeadline
HTML-escapes the raw headline and only converts the sentinel pairs into
<mark> tags, so result snippets cannot inject HTML.
Access Control
Three layers, all tenant-scoped:
- Coarse type gate (
filterTypesByPermission+TYPE_REQUIRED_PERMISSIONinsearchActions.ts) — before querying, an object type is included only if the principal holds at least one of its required permissions. The map value isstring | readonly string[]; types whose rows can require different permissions list all of them (e.g.status→['ticket:read', 'project:read']) so no class of user is wrongly excluded. This is only a pre-filter to shrink the query — the authoritative check is the per-rowrequired_permissionenforced below. - SQL-level predicate (
aclPredicateSql) — a static SQL fragment with bound parameters, ANDed into every search/count query. Enforcesrequired_permissionmembership,visible_to_user_ids,visible_to_roles,is_internal_only,is_private, andclient_scope_idagainst the resolved principal. - Per-row visibility verifiers (
verifyResultVisibility) — a defensive second pass for object types that need a live source check (e.g.documentverifies existence + client access viadocument_associations;workflow_taskchecks assignment). Applied to the fetched result page only (not to counts).
The principal is resolved once per action via resolveSearchAclPrincipal
(MSP vs client-portal aware). When a user's permissions change, enqueue
searchVisibleUserReindexHandler to refresh affected rows' ACL hints.
Client scoping (ClientAccess)
client_scope_id scopes a row to one client so client-portal users only see
their own client's data. The principal carries a ClientAccess value, the
single source of truth resolved by resolveClientAccess(user):
{ mode: 'scoped', clientIds }— client-portal users (their one client).{ mode: 'all' }— internal/MSP users (currently always unrestricted).
mode: 'all' makes the client-scope clause a no-op TRUE with no array
binding — internal users do not trigger a clients table scan, and no
large UUID array is bound into the (3×) per-search queries. This is the
single seam for future ABAC per-internal-user client restrictions: when
those land, resolveClientAccess returns { mode: 'scoped', clientIds } for
restricted internal users and both the SQL predicate and the per-row verifier
enforce it automatically — no isInternal shortcut to audit, and the SQL
predicate / verifier can never disagree (the prior empty-array overload, where
= ANY('{}') meant "none" in SQL but "all" in the verifier, is gone).
Configuration
| Setting | Default | Effect |
|---|---|---|
SEARCH_INDEX_LIVE |
false |
When true, the event subscriber writes to the index on entity events. When unset/false the subscriber acknowledges events without DB writes — index only updates via reconcile/backfill. Helm: server.searchIndexLive. |
| Full-search rate limit | 10 / sec / user | searchAppAction; exceeding throws SearchRateLimitError (HTTP 429, retryAfterMs). |
| Typeahead rate limit | 30 / sec / user | searchAppTypeaheadAction. |
| Search query length cap | 200 chars | parseQuery throws SearchQueryError('query_too_long'). |
| Page size | 25 | /msp/search fetches limit + 1 for cursor detection. |
| Reconcile cron | 0 6 * * * per tenant |
scheduleSearchReconcileJob. |
Operations
Backfill
# All tenants, all indexers
npm run search:backfill
# One tenant
npm run search:backfill -- --tenant=<tenant_uuid>
# One object type
npm run search:backfill -- --tenant=<tenant_uuid> --type=ticket_comment
The script logs and continues per indexer (one failing indexer does not abort
the run) and prints a failure summary at the end. It connects via
server/knexfile.cjs using the standard DB_* env / secrets; point
DB_HOST/DB_PORT at the target database when running outside the container
network.
Reconcile
search:reconcile runs nightly per tenant. To force an immediate run, enqueue
an immediate pg-boss job with { "tenantId": "<uuid>" } on the
search:reconcile queue (handler: searchReconcileHandler). The handler is
registered in registerAllHandlers.ts.
Health check
SELECT object_type, count(*), max(indexed_at)
FROM app_search_index
WHERE tenant = '<tenant_uuid>'
GROUP BY object_type
ORDER BY object_type;
Telemetry
searchActions emit structured logs via logger.info('[Search] metric', …):
search.query.count— every query (variant:full|typeahead).search.query.empty— query returned zero results.search.query.latency_ms— wall-clock latency, always emitted (finally).
ACL drift (a row passing the SQL predicate but failing a per-row verifier) is
reported via emitAclDrift.
Security Considerations
- SQL injection — all user input reaches SQL exclusively as bound
parameters (
?). The only string-interpolated SQL fragments are module constants, boolean-selected static literals (sort/snippet branches), andaclPredicateSql's static fragment. No user input is concatenated. - Prefix tsquery — user input is lower-cased and stripped to
\p{L}\p{N}\sbefore being turned intotok:*terms, so it cannot injectto_tsqueryoperators. - XSS — snippets are HTML-escaped; only sentinel-delimited spans become
<mark>. - Tenant isolation — every search/count query includes
s.tenant = ?::uuid; on Citus the table is distributed bytenant. - Permission gating —
required_permission+ ACL hints filter at SQL level; per-row verifiers add a defensive check on the result page.
Frontend
- Sidebar palette (
SearchPalette.tsx) — debounced typeahead (200ms), shows ≤5 rows + "See all N results" linking to/msp/search?q=…. - Results page (
SearchPageClient.tsx) — debounced URL sync; "All" view groups by type (≤10 rows/section) with per-type count chips; single-type view shows the full page (≤25) with cursor prev/next. The query input is the source of truth while typing; URL→input resync only occurs on external navigation (back/forward), guarded by apreviousInitialQueryRefso an in-flight server re-render cannot clobber typed characters.
Testing
Unit + contract tests live in server/src/test/unit/search*.test.ts
(query/SQL/ACL/indexers/actions/subscriber/backfill/reconcile, plus contract
tests for i18n, hash anchors, migration, deploy, UI). Integration tests
(server/src/test/integration/*search*) require a live database.
Run the unit suite:
cd server && npx vitest run src/test/unit/search*.test.ts
Known Limitations
- Counts skip per-row visibility verifiers (slight over-count under ACL drift).
- The "All" results view caps each type section at 10 rows; use the type chip for the full, paginated per-type list (deliberate federated-search UX — the grouped view is not globally paginated).
- Live indexing must be explicitly enabled (
SEARCH_INDEX_LIVE=true); otherwise the index only updates nightly or via backfill. - Only ticket statuses are searchable. Project / project_task / interaction statuses are intentionally not indexed (no global per-status destination); discovering them by name is a projects-module faceting concern, not global search.