Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

11 KiB
Raw Blame History

PRD: Integration Workflow Modules

  • Status: Draft
  • Owner: Robert Isaacs
  • Created: 2026-06-12
  • Design: ../2026-06-12-integration-workflow-modules-design.md (architecture authority)
  • Branch: feature/integration-workflow-modules off main

1. Problem statement & user value

The workflow designer can host integration-specific "app" modules — palette tiles exposing an integration's operations as typed workflow actions, shown only when that integration is connected — but exactly one exists (NinjaOne, 6 actions), its availability check is a hardcoded if-chain, and its module wiring is inlined in core.ts. Meanwhile the operations MSPs actually orchestrate across tools (run a script on an endpoint, trigger an RMM automation, resolve a security incident, post to a Teams channel, put a technician on the schedule) are absent from the palette, so the "self-healing alert" loop the RMM alert events enable cannot be finished inside Alga.

This project generalizes the module plumbing so each integration is one self-contained registration, then ships four new modules (Tactical RMM, Level, Huntress, Microsoft Teams), expands NinjaOne with script execution, and adds a core scheduling.create_entry action. Incumbent PSAs offer no user-composable canvas over third-party tools; the market's answer is a separate $5001,500/mo orchestration product. This puts it in the PSA.

2. Goals

  • Adding integration #6 requires one file plus one core.ts line — no framework edits, no availability if-chain.
  • A connected Tactical RMM tenant can run a stored script or raw command on an agent from a workflow and use its output in later steps.
  • A connected Level tenant can trigger a Level automation (optionally scoped to specific devices) and check its run status.
  • A connected Huntress tenant can enrich tickets from incident/agent/org data and resolve the Huntress incident when work completes.
  • A tenant with the Teams app active can notify a user's activity feed, DM a user via the bot, and post to a channel where the app is installed.
  • Workflows can create dispatch-board schedule entries that ride the existing calendar sync to a technician's connected external calendar.
  • Palette tiles appear if and only if the integration is available for the tenant; NinjaOne behavior is unchanged (parity regression).

3. Non-goals

  • QuickBooks Online / Stripe modules (financial follow-up, deliberately deferred), Tanium (pre-release).
  • Email app tile (email.send already covers tenant-provider send) and a direct Calendar app tile (schedule-entry approach chosen instead).
  • Huntress remediation approve/reject (sharp edge; revisit on demand).
  • DB-backed module catalog or per-tenant module curation.
  • Prebuilt workflow recipe gallery (launch follow-up, separate effort).
  • New vendor capabilities beyond those listed (e.g. Tactical service control, NinjaOne patch apply).

4. Personas & primary flows

  • Automation-minded MSP engineer: builds "disk-full self-remediation": RMM alert trigger → tacticalrmm.agents.run_script (cleanup script) → ticket note with output → levelio.alerts.resolve / alert reset → ticket closes. Builds "Huntress enrichment": ticket-created trigger → huntress.incidents.get + huntress.agents.get → ticket fields/note → on close, huntress.incidents.resolve.
  • Dispatcher: escalation workflow posts to the service-desk Teams channel and creates a schedule entry for the on-call technician.
  • Tenant admin: connects/disconnects integrations and sees palette tiles appear/disappear accordingly; no configuration beyond the integration itself.

5. Framework changes (build first)

5.1 Availability resolver registry

New registry in shared/workflow/runtime/registries/ (sibling of integrationModuleRegistry.ts): resolver type (knex, tenantId) => Promise<boolean> registered under a module's availabilityKey. loadAvailableFirstPartyIntegrationAppKeys (ee/packages/workflows/src/actions/workflow-runtime-v2-actions.ts) replaces its if-chain with registry lookups; a module whose key has no registered resolver is not available (fail closed). Resolver errors are caught and treated as unavailable (palette listing must not 500 because one vendor table is missing).

5.2 RMM availability factory

rmmIntegrationAvailability(provider) returns a resolver checking rmm_integrations for (tenant, provider) with is_active = true and connected_at IS NOT NULL — used by all four RMM modules.

5.3 One-call module registration + NinjaOne migration

registerIntegrationWorkflowModule({ module, availability, registerActions }) in ee/packages/workflows: registers actions, the module tile, and the availability resolver, idempotently. core.ts becomes one call per integration. NinjaOne migrates onto the helper and the RMM factory; the hardcoded 'rmm:ninjaone' branch is deleted. Regression bar: identical palette/catalog output for connected and disconnected NinjaOne tenants.

6. Modules

Conventions for every action: provider.noun.verb ID, Zod input/output schemas, ui: { label, description, category, icon }, correct sideEffectful, idempotency: { mode: 'engineProvided' }, handler errors thrown (engine normalizes). Vendor endpoint paths marked verify must be confirmed against vendor docs during implementation.

6.1 NinjaOne expansion (app:ninjaone)

FetchNinjaOneWorkflowClient gains runScript (POST /v2/device/{id}/script/run; type SCRIPT|ACTION, parameters, runAs) and a scripting-options discovery call (verify, expected GET /v2/device/{id}/scripting/options). New actions ninjaone.devices.run_script (side-effectful) and ninjaone.devices.scripting_options (read) join the existing six in allowedActionIds.

6.2 Tactical RMM (app:tacticalrmm)

Client: extend TacticalRmmClient (packages/integrations/src/lib/rmm/tacticalrmm/tacticalApiClient.ts) with agent-detail, script-list, run-script, run-command, reboot wrappers (verify paths: /scripts/, /agents/{agent_id}/runscript/, /agents/{agent_id}/cmd/, /agents/{agent_id}/reboot/). The workflows package takes a dependency on packages/integrations; runtime support resolves the integration row + tenant secrets (tacticalrmm_api_key or Knox trio) exactly as the existing sync path does.

Actions: agents.find, agents.get, scripts.list, agents.run_script (returns output), agents.run_command, agents.reboot (last three side-effectful).

6.3 Level (app:levelio)

Client: thin fetch client inside the workflows package (NinjaOne pattern — ee/server client not importable). Tenant secret levelio_api_key; cursor pagination as in the existing client.

Actions: devices.find, devices.get, alerts.list_active, alerts.resolve (side-effectful, POST /v2/alerts/{id}/resolve), updates.list, automations.list (automations + their webhook tokens), automations.trigger (side-effectful, POST /v2/automations/webhooks/{token} with optional device_ids[]; actionable error when the automation has no webhook trigger configured in Level), automations.run_status.

6.4 Huntress (app:huntress)

Client: thin fetch client in the workflows package; Basic auth from tenant secrets huntress_api_key/huntress_api_secret; replicate the 60 req/min throttle and 429 backoff of the ee/server client.

Actions: incidents.find, incidents.get, incidents.resolve (side-effectful; Huntress write API, verify exact endpoint), organizations.list, agents.get, account.get.

6.5 Microsoft Teams (app:teams)

Availability: own resolver — teams_integrations.install_status = 'active' AND the Teams add-on active for the tenant (mirror the delivery path's checks). The workflows package depends on ee/packages/microsoft-teams.

Actions (all side-effectful):

  • teams.notify_user — Graph sendActivityNotification through the existing delivery path with a generic template; target is an Alga user with a linked Microsoft account; actionable error otherwise.
  • teams.send_dm — proactive Bot Framework message (text + optional card) via stored teams_conversation_references; actionable error if the user has never opened the bot.
  • teams.post_to_channel — requires new createConversation support in ee/packages/microsoft-teams (proactive channel conversation via Bot Framework; Graph app-only channel posting is Microsoft-protected, the bot route is deliberate). Works in any channel of a team where the Alga Teams app is installed; actionable error otherwise.

7. Core action: scheduling.create_entry

Shared (CE+EE) action in the scheduling.* business-operations family. Inputs: assigned user(s), title, start/end, optional ticket/project link, optional status/notes. Creates a dispatch-board schedule entry through the same model layer the UI uses; the existing calendar sync pushes it outward to a connected user calendar. Validation errors (unknown user, end before start) are explicit.

8. Designer surface

Icon tokens for tacticalrmm, levelio, huntress, teams added to the designer icon set, reusing the integration logos already shipped in the settings UI. Palette grouping/ordering verified for the new app:* groups (extend PALETTE_CATEGORY_ORDER handling only if required).

9. Error handling conventions

Handlers throw; the Temporal activity layer normalizes to the runtime error payload and stamps workflow_action_invocations FAILED. Vendor HTTP errors surface status + vendor message, never credentials. Most-common failures get explicit messages: integration not connected/inactive; Level automation lacks webhook trigger; Teams user unlinked / no bot conversation / app not in team. Side-effectful actions rely on engine-provided idempotency so retries are safe.

10. Testing & rollout

Unit: handler tests per integration with mocked clients (NinjaOne handler test pattern); availability-registry and resolver tests; helper idempotency; catalog gating matrix. Regression: NinjaOne palette parity. Manual smoke on the local-test stack: palette gating on connect/disconnect, Tactical mock run-script round-trip, Teams notify/DM/channel in a test tenant, schedule entry on the dispatch board. No migrations and no new tables; all gating is read-path, so rollout is inert until an integration is connected.

11. Editor behavior for disconnected integrations

Availability gates ADDING, not VIEWING. The designer catalog keeps first-party integration records when the integration is disconnected, flagged available: false; extension-app filtering is unchanged. The palette excludes unavailable records (the original hide-when-not-enabled decision stands). Existing steps referencing a disconnected integration keep full group context: the grouped action config section renders with an amber "not connected" banner naming the integration, the step card shows a "Disconnected" badge with an explanatory tooltip, and the input-mapping editor remains fully usable (it always resolved schemas from the unfiltered registry). Publish stays allowed; runs fail with the handlers' actionable INTEGRATION_INACTIVE errors. Publish-time and disconnect-time warnings were considered and deferred.