Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
12 KiB
PRD — RMM Alert Handling
- Slug:
2026-06-12-rmm-alert-handling - Date: 2026-06-12
- Status: Approved
- Branch:
feature/rmm-alerts-sync - Design doc:
docs/plans/2026-06-12-rmm-alert-handling-design.md
Summary
Turn RMM alerts into tickets automatically. A provider-generic pipeline in
shared/rmm/alerts/ ingests normalized alert events from the NinjaOne and
TacticalRMM webhooks, evaluates tenant-defined rules, creates or updates
tickets with dedup, and keeps alert and ticket lifecycles in sync in both
directions. Tenants manage rules from the RMM integration settings UI.
Problem
Alert ingestion is scaffolded but broken on main: the webhook writes columns
that don't exist in rmm_alerts, the rules engine expects a JSONB schema the
migration never created, and nothing connects alerts to tickets except a manual
button. There is no dedup, no auto-close, no outbound reset, and no rules UI.
Competing PSAs (ConnectWise, Autotask, Halo) treat all of this as table stakes.
Goals
- Webhook-delivered alerts create tickets automatically per tenant-defined rules.
- Repeat firings of the same condition on the same device land on the existing open ticket instead of creating ticket storms.
- Alert resets close untouched tickets and annotate touched ones.
- Closing an alert-linked ticket resets the alert in the RMM (per-rule opt-out).
- Rules are manageable from the integration settings UI by admins.
- Alert events are first-class workflow v2 triggers; matched rules can notify users.
- Maintenance windows suppress alert ticketing for a client, asset, or integration during planned work, without losing the alerts.
- Scheduled polling reconciles missed webhooks: missed triggers become tickets, missed resets close stale tickets, and post-window still-active alerts get processed.
- One pipeline serves NinjaOne and TacticalRMM; a third provider only needs a normalizer and an optional outbound adapter.
Non-goals
- RMM device-count billing integration, scheduled device sync, or org auto-matching.
- Per-rule dedup configuration (dedup behavior is fixed).
- Migrating NinjaOne device sync onto
sharedAssetIngestionService(separate effort).
Users and Primary Flows
- MSP admin configures alert rules per RMM integration: match conditions, ticket routing, lifecycle flags, notifications.
- Dispatcher/tech works alert tickets like any other ticket: sees occurrence comments on flapping conditions, sees resolution comments when alerts reset, and closing a ticket clears the alert in the RMM.
- Automation builder uses
RMM_ALERT_TRIGGERED/RMM_ALERT_RESOLVEDworkflow triggers and thermm.alerts.create_ticketaction for custom flows.
UX / UI Notes
- New "Alert Rules" section in RMM integration settings (next to the org-mapping manager), rendered for NinjaOne and TacticalRMM.
- Priority-ordered rules list: active toggle, reorder controls, edit/delete.
- Rule editor dialog with a Match group (severities, activity types, alert
classes, source types, organization picker fed from
rmm_organization_mappings, keywords, message regex) and an Actions group (create-ticket toggle, board picker, priority override, assignee, title/description templates with placeholder hints, auto-resolve toggle, reset-on-close toggle, notify-users picker). - Save-time validation errors (e.g., bad regex) shown inline in the dialog.
- "Maintenance Windows" subsection beside Alert Rules: a list plus an editor with client/asset scope pickers and a one-off or weekly recurring schedule (with timezone).
- Alert polling enable/disable and interval (5–60 minutes) in integration settings.
- Existing per-asset
AssetAlertsSectionremains the alert-viewing surface.
Requirements
Functional Requirements
FR-1 Schema
One additive migration. rmm_alerts gains activity_type, acknowledged_at,
acknowledged_by, dedup_key (indexed with tenant + integration), occurrence_count
(default 1), last_occurrence_at, matched_rule_id, auto_ticket_created,
and suppressed_by_window_id (status gains a suppressed value).
Raw payloads standardize on the existing metadata jsonb. rmm_alert_rules
gains conditions and actions jsonb and drops the eleven flat filter/action
columns. New rmm_maintenance_windows table (FR-10). The deployed
20251124000001 migration is not rewritten; no backfill.
FR-2 Contracts and normalizers
shared/rmm/alerts/contracts.ts defines NormalizedRmmAlertEvent
(kind: triggered | reset | acknowledged) and the optional per-provider
RmmAlertOutboundAdapter (resetAlert). Shared Zod schemas define the rule
conditions/actions shapes (see design doc for exact fields). NinjaOne and
TacticalRMM webhook routes map their payloads to the contract; existing webhook
auth, tenant resolution, and tier gating are unchanged.
FR-3 Ingest pipeline
processRmmAlertEvent() runs synchronously in the webhook request. Triggered:
upsert rmm_alerts on (tenant, integration_id, external_alert_id), compute
and store dedup_key (device + condition identity; NinjaOne: statusCode
falling back to activityType), evaluate active rules first-match by
priority_order (empty conditions = catch-all; a rule that fails to evaluate
is logged and skipped), store matched_rule_id, then act. Replayed webhooks
are no-ops (idempotent ingest).
FR-4 Ticketing and dedup
If the matched rule creates tickets: an alert whose dedup_key matches an
alert with a still-open linked ticket joins that ticket (link, increment
occurrence_count, internal "re-triggered — Nth occurrence" comment).
Otherwise create a ticket honoring boardId, priorityOverride (else
severity→priority mapping), assignToUserId, and the title/description
templates with {{device}}/{{message}}/{{severity}}/{{organization}}
placeholders. Created tickets get source + source_reference, an asset
association, an initial internal comment, and client resolution from the asset
or the org mapping.
FR-5 Lifecycle
Reset marks the alert resolved. With a linked ticket and autoResolveTicket:
always comment; close (via autoResolveStatusId, else the tenant's first
is_closed status) only if the ticket is untouched — no human comments, no time
entries, no manual status change; rule auto-assignment doesn't count.
Acknowledged events stamp acknowledged_at/status. A ticket-closed event-bus
subscriber resets still-active linked alerts in the RMM via the provider's
outbound adapter when the matched rule's resetAlertOnTicketClose is true
(default). Outbound failures log and stamp alert metadata; they never block
the close. Providers without an adapter are skipped.
FR-6 Rules CRUD
List/create/update/delete/reorder server actions in packages/integrations,
admin-gated, Zod-validated, regex validated at save time.
FR-7 Rules UI
The settings section and editor described in UX notes, shared across providers.
FR-8 Workflow v2 and notifications
RMM_ALERT_TRIGGERED and RMM_ALERT_RESOLVED registered in the workflow v2
catalog with provider-generic payloads and published by the pipeline (replacing
the orphaned legacy-bus publishes). New rmm.alerts.create_ticket workflow
action invokes the shared ticket creator by alert ID. New rmm-alert
notification category delivers in-app + email to a matched rule's
notifyUserIds, honoring per-user preferences.
FR-9 Hardening and cleanup
Implement CSRF validation in the NinjaOne OAuth callback. Move
ninjaone/alerts/* logic into the shared module and remove the superseded
resetInNinjaOne TODO. Deprecate rmm_organization_mappings.auto_create_tickets
(no read paths remain).
FR-10 Maintenance windows
New rmm_maintenance_windows table: optional integration_id/client_id/
asset_id scopes (null = all of that dimension), one-off starts_at/ends_at
or weekly recurrence jsonb (days, time range, timezone), name, is_active.
The pipeline checks windows before rule matching: an alert matching all
non-null scopes of an active window at its occurredAt is stored with
status = 'suppressed' and suppressed_by_window_id — no ticket, no
notifications, no workflow events. A reset for a suppressed alert resolves it
quietly. Window CRUD server actions are admin-gated and Zod-validated, with the
settings UI described in UX notes.
FR-11 Alert polling (reconciliation)
A per-integration Temporal scheduled workflow (Entra per-tenant schedule
pattern), default on for connected integrations, every 15 minutes (configurable
5–60), created on connect and removed on disconnect. Each cycle, through the
same pipeline: (1) upsert RMM-active alerts missing locally as triggered
events; (2) synthesize reset events for local active alerts no longer active
in the RMM; (3) process still-active suppressed alerts whose window ended
through the normal rules path. Webhooks stay primary; ingest idempotency makes
overlap harmless.
Non-functional Requirements
- Webhook ingest path makes no external API calls; webhook latency stays bounded. RMM API calls happen only in the poller and the ticket-close subscriber, both off the request path.
- All queries tenant-scoped (CitusDB composite keys:
tenant+ entity id). - Webhook response semantics preserved: 200 unmapped org, 200 success, 500 unexpected error (RMM retries; ingest idempotency makes this safe).
Data / API / Integrations
See FR-1 for schema and the design doc for exact JSONB shapes. External APIs:
NinjaOneClient.resetAlert() (exists); TacticalRMM alert resolution if its API
supports it (open question in SCRATCHPAD — adapter is optional by design).
Security / Permissions
- Rule CRUD requires admin permission; all actions tenant-scoped.
- Webhook auth unchanged (HMAC signature / shared-secret header).
- OAuth callback CSRF validation (FR-9).
Observability
Pipeline logs rule-evaluation skips and outbound reset failures. No new metrics/monitoring infrastructure.
Rollout / Migration
Single additive migration; deployed tables hold negligible data, so no backfill. No feature flag: with zero rules configured, the pipeline stores alerts without creating tickets, which matches today's effective behavior.
Open Questions
Tracked in SCRATCHPAD.md (Tactical outbound capability; exact ticket-closed
event name).
Acceptance Criteria (Definition of Done)
- A NinjaOne CONDITION TRIGGERED webhook for a mapped org creates an
rmm_alertsrow and, when a rule matches, a correctly-routed ticket. - The same condition re-firing while that ticket is open adds an occurrence comment and creates no new ticket; after the ticket closes, a new firing creates a new ticket.
- CONDITION RESET resolves the alert, comments the ticket, and closes it only if untouched.
- Closing an alert-linked ticket resets the alert in NinjaOne unless the rule opted out.
- A TacticalRMM alert webhook flows through the same pipeline end to end.
- Admins manage rules entirely from the settings UI; invalid rules are rejected at save time.
- Alert workflows can trigger on
RMM_ALERT_TRIGGERED/RMM_ALERT_RESOLVEDand callrmm.alerts.create_ticket. - Matched rules with
notifyUserIdsproduce in-app and email notifications per user preference. - An alert firing inside a matching maintenance window creates no ticket and no notifications; the same alert outside the window processes normally; a condition still firing after its window ends becomes a ticket via the poller.
- With webhooks disabled, a poll cycle turns RMM-active alerts into tickets per the rules and closes stale tickets whose alerts cleared in the RMM.
- All features in
features.jsonimplemented; the automated core intests.jsonpasses; theSMOKE_TESTS.mdchecklist has been executed against a live stack.