PSA/docs/plans/2026-06-12-integration-workflow-modules-design.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

14 KiB
Raw Blame History

Integration Workflow Modules — Design

  • Date: 2026-06-12
  • Status: Approved (brainstorm: Robert + Claude)
  • Branch: feature/integration-workflow-modules (cut from main)
  • Predecessor context: the RMM alert handling feature (feature/rmm-alerts-sync, PR #2694) made RMM_ALERT_TRIGGERED / RMM_ALERT_RESOLVED available as workflow triggers. This project is independent of that branch (all adapters and the module registry are on main), but together they complete the "self-healing alert" loop.

Goal

The workflow designer supports integration-specific "app" modules — palette tiles that expose an integration's operations as workflow actions, shown only when that integration is connected. Today exactly one exists (NinjaOne, 6 actions). This project:

  1. Generalizes the module plumbing so each integration is a self-contained, drop-in registration (Approach B from the brainstorm).
  2. Adds four new modules — Tactical RMM, Level, Huntress, Microsoft Teams — each exposing its ~5 highest-value actions.
  3. Expands the existing NinjaOne module with script execution (the original motivating example).
  4. Adds one core action, scheduling.create_entry, filling a gap in the scheduling.* family (reschedule/cancel exist; create does not).

Why it matters (market context)

Incumbent PSAs (ConnectWise, Autotask, Halo) have trigger/condition/action rules and hardcoded vendor-pair integrations, but no user-composable canvas with third-party integration actions. The market's answer is a separate orchestration product (Rewst, MSPbots, n8n) at $5001,500+/mo. All-in-one platforms (Syncro, Atera, SuperOps) can "run script on alert" only inside their own walled garden. PSA-native cross-tool orchestration — alert fires → workflow runs a diagnostic script via the customer's actual RMM → attaches output → resolves alert → ticket closes — is a story none of them tell. Catalog depth is the moat; this project's framework work is what makes integration #6 (QBO et al.) cheap.

Secondary asset: every action registered here is a Zod-schema'd, idempotent, tenant-scoped operation — a tool catalog future AI agents can call safely.

Scope

In scope

  • Availability-resolver registry + registerIntegrationWorkflowModule helper; NinjaOne migrated onto both (palette parity is the regression to guard).
  • Tactical RMM module (6 actions, incl. run script/command/reboot — requires thin client wrappers for endpoints the vendor API already has).
  • Level module (8 actions, incl. the automation trio: list / trigger-by-webhook / run status).
  • Huntress module (6 actions, incl. incidents.resolve via Huntress's new write API).
  • Teams module (3 actions: notify user, bot DM, post to channel — channel posting needs new bot-framework createConversation support).
  • NinjaOne expansion: devices.run_script + script/action discovery read.
  • Core scheduling.create_entry action (shared, CE+EE).
  • Designer icon tokens for the four new tiles (reuse existing settings-UI integration logos).

Deferred (explicit decisions)

Item Reason
QuickBooks Online, Stripe External financial integrations held for a dedicated follow-up (Robert wants to think the financial surface through). The framework makes this cheap when it lands.
Tanium Pre-release.
Email app module email.send (shared core action) already sends via the tenant's configured M365/Gmail/SMTP provider; a tile would duplicate it.
Calendar app module Calendar connections are per-user; direct external-calendar writes bypass the Alga dispatch board. Decision: scheduling.create_entry instead — lands on the board AND syncs outward via existing calendar sync when the user has a connected calendar.
Huntress remediation approve/reject Vendor API supports it, but auto-approving SOC remediations from a workflow is a sharp edge. Revisit on demand.
DB-backed module catalog / per-tenant module curation (Approach C) YAGNI; in-code registry does the job.
Recipe gallery (prebuilt workflows) Launch/marketing concern, separate effort — but it is what sells this; see Open items.

Architecture

Current state (facts, file paths as of main @ 2c392acc9e)

  • Module type + in-memory registry: shared/workflow/runtime/registries/integrationModuleRegistry.ts (WorkflowIntegrationModuleDefinition: groupKey 'app:<x>', label, iconToken, allowedActionIds, defaultActionId, availabilityKey).
  • Action registry (Zod in/out schemas, sideEffectful, idempotency, ui metadata, handler): shared/workflow/runtime/registries/actionRegistry.ts.
  • Palette catalog builder: shared/workflow/runtime/designer/actionCatalog.ts (buildWorkflowDesignerActionCatalog).
  • Availability filtering: loadAvailableFirstPartyIntegrationAppKeys in ee/packages/workflows/src/actions/workflow-runtime-v2-actions.ts — currently a hardcoded if-chain keyed on availabilityKey (only case: 'rmm:ninjaone'rmm_integrations row with is_active + connected_at).
  • NinjaOne module: ee/packages/workflows/src/runtime/actions/registerNinjaOneWorkflowActions.ts + ninjaOneWorkflowRuntimeSupport.ts (self-contained fetch client; ee/packages/workflows cannot import from ee/server — package/app boundary). Module registered inline in ee/packages/workflows/src/runtime/core.ts.
  • Execution: Temporal activity executeWorkflowRuntimeV2ActionStepexecuteActionInvocation (ee/temporal-workflows/src/activities/workflow-runtime-v2-activities.ts); engine-provided idempotency, invocation rows in workflow_action_invocations, errors normalized to a runtime payload.

Framework changes (do these first; everything else builds on them)

1. Availability resolver registry. New registry in shared/workflow/runtime/registries/ (sibling of integrationModuleRegistry):

type WorkflowModuleAvailabilityResolver = (knex: Knex, tenantId: string) => Promise<boolean>;
registerWorkflowModuleAvailabilityResolver(availabilityKey: string, resolver: ...): void;

loadAvailableFirstPartyIntegrationAppKeys drops the if-chain: for each registered module with an availabilityKey, look up the resolver and call it. No resolver registered → module not available (fail closed). A parameterized factory rmmIntegrationAvailability(provider) covers all four RMMs (rmm_integrations row: provider = X, is_active, connected_at IS NOT NULL). Teams gets its own resolver (see below). NinjaOne's hardcoded branch is deleted and replaced by rmmIntegrationAvailability('ninjaone').

2. One-call module registration. Helper in ee/packages/workflows:

registerIntegrationWorkflowModule({
  module: WorkflowIntegrationModuleDefinition,
  availability: WorkflowModuleAvailabilityResolver,
  registerActions: (registry: ActionRegistryV2) => void,
});

It registers actions, the module tile, and the availability resolver, idempotently (same guard style as today's core.ts). Each integration becomes exactly one file under ee/packages/workflows/src/runtime/actions/ (plus a runtime-support file where a client must be built); core.ts becomes one call per integration.

Modules and actions

Conventions: action IDs are provider.noun.verb; every action follows the NinjaOne registration shape (Zod input/output, ui: { label, description, category, icon }, sideEffectful, idempotency: { mode: 'engineProvided' }).

NinjaOne — app:ninjaone (expansion of existing module)

Action Side-effect Vendor surface
ninjaone.devices.run_script yes POST /v2/device/{id}/script/run — type SCRIPT or built-in ACTION, parameters, runAs
ninjaone.devices.scripting_options no device script/action discovery (verify exact path, expected GET /v2/device/{id}/scripting/options)

Existing six actions unchanged. FetchNinjaOneWorkflowClient gains the two methods.

Tactical RMM — app:tacticalrmm (availability: rmmIntegrationAvailability('tacticalrmm'))

Action Side-effect Vendor surface (verify exact paths against Tactical API during implementation)
tacticalrmm.agents.find no /api/beta/v1/agent/ (already wrapped)
tacticalrmm.agents.get no agent detail
tacticalrmm.scripts.list no /scripts/ — needed to pick a script ID
tacticalrmm.agents.run_script yes /agents/{agent_id}/runscript/ — returns output
tacticalrmm.agents.run_command yes /agents/{agent_id}/cmd/ — raw shell
tacticalrmm.agents.reboot yes /agents/{agent_id}/reboot/

Client: reuse TacticalRmmClient (packages/integrations/src/lib/rmm/tacticalrmm/tacticalApiClient.ts) — ee/packages/workflows may depend on packages/integrations. New endpoint wrappers are added to that client (they benefit non-workflow callers too). Credentials: existing tenant secrets (tacticalrmm_api_key / Knox trio) + rmm_integrations.instance_url.

Level — app:levelio (availability: rmmIntegrationAvailability('levelio'))

Action Side-effect Vendor surface
levelio.devices.find no GET /v2/devices (group filters)
levelio.devices.get no GET /v2/devices/{id}
levelio.alerts.list_active no GET /v2/alerts
levelio.alerts.resolve yes POST /v2/alerts/{id}/resolve
levelio.updates.list no GET /v2/updates (patch posture)
levelio.automations.list no automations + their webhooks (discovery for trigger)
levelio.automations.trigger yes POST /v2/automations/webhooks/{token}, optional device_ids[] — Level's remote-execution model. Error message must say "automation needs a webhook trigger configured in Level" when applicable
levelio.automations.run_status no show automation run

Client: thin fetch client in the workflows package (NinjaOne-style; the ee/server Level client is not importable). Tenant secret levelio_api_key; base URL per existing convention.

Huntress — app:huntress (availability: rmmIntegrationAvailability('huntress'))

Action Side-effect Vendor surface
huntress.incidents.find no GET /v1/incident_reports (status/severity/org filters)
huntress.incidents.get no incident report by id
huntress.incidents.resolve yes Huntress write API (resolve incident report) — bidirectional close parity
huntress.organizations.list no GET /v1/organizations
huntress.agents.get no GET /v1/agents/{id}
huntress.account.get no GET /v1/account

Client: thin fetch client in the workflows package. Tenant secrets huntress_api_key / huntress_api_secret (Basic auth); replicate the 60 req/min throttle + 429 backoff from the ee/server client. Primary use is enrichment (ticket workflows pulling incident + agent detail) plus close-the-loop resolve.

Microsoft Teams — app:teams (availability: own resolver — teams_integrations.install_status = 'active' AND Teams add-on active, mirroring the delivery path's checks)

Action Side-effect Mechanism
teams.notify_user yes Graph sendActivityNotification via existing delivery path (ee/packages/microsoft-teams), generic template; target is an Alga user with a linked Microsoft account
teams.send_dm yes proactive Bot Framework message (text + optional card) via stored teams_conversation_references; explicit error if the user has never opened the bot
teams.post_to_channel yes new bot-framework createConversation support in ee/packages/microsoft-teams; works in any channel of a team where the Alga Teams app is installed. Graph app-only channel posting is a Microsoft-protected API — the bot route is deliberate

ee/packages/workflows depends on ee/packages/microsoft-teams (package→package is fine). post_to_channel is the implementation-heavy item of the project.

Core action — scheduling.create_entry (shared, CE+EE, in the scheduling.* business-operations family)

Inputs: assigned user(s), title, start/end, optional ticket/project link, optional status/notes. Creates a dispatch-board schedule entry; the existing calendar sync pushes it to the assignee's connected external calendar. No Calendar app tile.

Error handling

NinjaOne conventions throughout: handlers throw; the engine normalizes to the runtime error payload and stamps the invocation FAILED. Vendor HTTP errors surface status + vendor message, never credentials. The two most common failures get explicit, actionable messages: "integration not connected / inactive" and (Teams) "user has no bot conversation" / "app not installed in that team". Side-effectful actions rely on engine-provided idempotency so Temporal retries are safe.

Testing

  • Handler unit tests per integration with mocked clients, mirroring ninjaOneWorkflowActions.handlers.test.ts.
  • Availability-resolver tests; the NinjaOne migration's regression bar is palette parity before/after.
  • Catalog tests: tiles appear only for connected integrations (extend existing actionCatalog tests).
  • Manual smoke on the local-test stack: Tactical mock server round-trip for run_script/run_command; palette gating on connect/disconnect; scheduling.create_entry → entry on dispatch board.

Vendor grounding (verified 2026-06-12)

The action lists were verified against vendor APIs, not just Alga's existing adapter code — which corrected three assumptions:

Exact endpoint paths marked "verify" above must be confirmed against vendor docs during implementation.

Open items / follow-ups (not in this branch)

  • Recipe gallery: 34 prebuilt workflows ("disk-full self-remediation", "Huntress incident enrichment + bidirectional close", "escalation fan-out to Teams channel") — this is what demos and sells the feature.
  • Financial follow-up: QBO (+ Stripe) module on this framework, pending Robert's call on the financial action surface.
  • Tanium module when the integration leaves pre-release.
  • Huntress remediation approve/reject if customer demand materializes.