Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

105 lines
9.1 KiB
Markdown

# Scratchpad — NinjaOne Proactive Token Refresh
- Plan slug: `ninjaone-proactive-token-refresh`
- Created: `2026-03-26`
## What This Is
Rolling notes for implementing per-integration proactive NinjaOne OAuth token refresh scheduling through Temporal.
## Decisions
- (2026-03-26) Use per-integration delayed Temporal refresh scheduling rather than a global scanner.
- (2026-03-26) Keep existing lazy refresh in `NinjaOneClient` as fallback; proactive refresh reduces user-facing failures but does not replace request-time safety checks.
- (2026-03-26) Treat terminal provider responses like `invalid_token` as reconnect-required lifecycle failures, not ordinary sync failures.
- (2026-03-26) Store proactive lifecycle state in `rmm_integrations.settings.tokenLifecycle` (non-secret metadata only) with a monotonic `scheduleNonce` and `activeWorkflowId` so stale scheduled workflows safely no-op.
- (2026-03-26) Use one-off delayed workflows (`startDelay`) on the existing `TEMPORAL_JOB_TASK_QUEUE` (`alga-jobs` default) and terminate prior active workflow handles during reschedule to keep at most one active future refresh.
- (2026-03-26) Consider NinjaOne refresh failures terminal when provider response indicates invalid refresh token (`400 invalid_token` / `invalid_grant`), mark reconnect-required, and stop rescheduling until reconnect.
## Discoveries / Constraints
- (2026-03-26) Current NinjaOne credentials are stored only in tenant secret storage under `ninjaone_credentials`; `rmm_integrations` does not currently store OAuth expiry.
- (2026-03-26) Production logs from `msp/temporal-worker-868df5f5fb-g2744` showed a new organization sync starting at `2026-03-26T19:00:19.652Z`.
- (2026-03-26) Production logs from `msp/temporal-worker-868df5f5fb-jbrx9` showed the sync failing during token refresh at `2026-03-26T19:00:20.071Z` with `400 Bad Request`, `ERR_BAD_REQUEST`, and `data: { error: 'invalid_token' }` from `https://ca.ninjarmm.com/oauth/token`.
- (2026-03-26) That proves NinjaOne refresh is already attempted inside the Temporal worker during sync execution, but only on demand.
- (2026-03-26) The current code already emits `INTEGRATION_TOKEN_EXPIRING` and `INTEGRATION_TOKEN_REFRESH_FAILED`, but no code currently schedules a future refresh off those signals.
- (2026-03-26) `ee/temporal-workflows` imports `ee/server` NinjaOne integration code at runtime; avoid `@/` aliases in shared NinjaOne modules and prefer relative imports so temporal workspace compilation can resolve modules consistently.
- (2026-03-26) Local test/typecheck execution is currently environment-limited by missing workspace deps (e.g. `pathe`, `knex`, `@temporalio/*` resolution in this checkout), so verification is restricted to static inspection and targeted test file creation in this run.
## Commands / Runbooks
- (2026-03-26) Confirm current NinjaOne refresh implementation:
- `rg -n "refreshAccessToken\\(|grant_type: 'refresh_token'|oauth/token" ee/server/src/lib/integrations/ninjaone/ninjaOneClient.ts`
- (2026-03-26) Confirm NinjaOne sync workflow entrypoints:
- `rg -n "ninjaOneSyncWorkflow|syncOrganizations" ee/server/src/lib/integrations/ninjaone/sync ee/temporal-workflows/src/workflows ee/temporal-workflows/src/activities`
- (2026-03-26) Inspect recent MSP Temporal worker logs:
- `kubectl -n msp logs temporal-worker-868df5f5fb-g2744 -c temporal-worker --since=10m --timestamps`
- `kubectl -n msp logs temporal-worker-868df5f5fb-jbrx9 -c temporal-worker --since=10m --timestamps`
- (2026-03-26) Validate plan artifacts:
- `python3 /Users/roberisaacs/.codex/skills/alga-plan/scripts/validate_plan.py ee/docs/plans/2026-03-26-ninjaone-proactive-token-refresh`
- (2026-03-26) Validate new proactive scheduling implementation references:
- `rg -n "ninjaone-token-refresh|ninjaOneProactiveTokenRefreshWorkflow|proactiveNinjaOneTokenRefreshActivity|scheduleNinjaOneProactiveRefresh" ee/server/src ee/temporal-workflows/src`
- (2026-03-26) Attempt targeted tests (blocked by missing local dependency graph):
- `cd ee/server && npm run test:unit -- src/__tests__/unit/ninjaoneProactiveRefresh.schedule.test.ts src/__tests__/unit/ninjaOneClient.baseUrl.test.ts`
- `cd ee/temporal-workflows && npm run test -- src/__tests__/worker-registration.test.ts`
- (2026-03-26) Attempt type checks (blocked by pre-existing missing modules in workspace bootstrap):
- `cd ee/server && npm run typecheck`
- `cd ee/temporal-workflows && npm run type-check`
## Implementation Notes
- (2026-03-26) Added new server lifecycle module: `ee/server/src/lib/integrations/ninjaone/proactiveRefresh.ts`
- Computes refresh target using configurable buffer/min-delay.
- Starts delayed Temporal workflow per integration and records lifecycle metadata in `settings.tokenLifecycle`.
- Reloads latest credentials on execution, refreshes token, persists rotated credentials, updates lifecycle metadata, and reschedules next refresh.
- Marks terminal failures reconnect-required and prevents further scheduling until reconnect.
- Marks unreadable/missing credentials unschedulable with explicit lifecycle failure metadata.
- (2026-03-26) Added dedicated Temporal pair:
- Workflow: `ee/temporal-workflows/src/workflows/ninjaone-token-refresh-workflow.ts`
- Activity: `ee/temporal-workflows/src/activities/ninjaone-token-refresh-activities.ts`
- Exported from workflow/activity indexes and worker-registration coverage.
- (2026-03-26) Wired scheduling hooks:
- OAuth callback now clears reconnect-required state and seeds proactive refresh schedule after successful connect/reconnect.
- Lazy refresh success in `NinjaOneClient.refreshAccessToken` now reschedules proactive workflow.
- Disconnect action now cancels/inactivates pending proactive refresh workflow lifecycle.
- (2026-03-26) Added unit test coverage:
- `ee/server/src/__tests__/unit/ninjaoneProactiveRefresh.schedule.test.ts` verifies delayed scheduling on connect path semantics and previous-workflow termination during reschedule.
- (2026-03-26) Added proactive execution/backfill tests:
- `ee/server/src/__tests__/unit/ninjaoneProactiveRefresh.execution.test.ts`
- verifies runtime credential reload from secret storage;
- verifies rotated credential persistence;
- verifies successful proactive refresh reschedules next workflow and increments lifecycle nonce;
- verifies terminal `invalid_token` marks reconnect-required and avoids further scheduling;
- verifies missing credentials become unschedulable without reschedule loop;
- verifies integration settings lifecycle metadata excludes raw token material.
- `ee/temporal-workflows/src/schedules/__tests__/setupSchedules.ninjaone-backfill.test.ts`
- verifies rollout backfill seeding runs for active integrations lacking lifecycle ownership and skips reconnect-required/already-owned lifecycle rows.
- (2026-03-26) Added lazy fallback schedule handoff test:
- `ee/server/src/__tests__/unit/ninjaOneClient.proactiveSchedule.test.ts`
- verifies successful lazy NinjaOne token refresh triggers proactive reschedule (`source: lazy_refresh_success`) for the same integration.
- (2026-03-26) Expanded proactive suite coverage:
- `ee/server/src/__tests__/unit/ninjaoneProactiveRefresh.execution.test.ts`
- includes inactive integration no-op assertions for disconnect protection semantics.
- `ee/server/src/__tests__/unit/ninjaoneProactiveRefresh.schedule.test.ts`
- includes reconnect lifecycle reset + fresh schedule seeding assertions.
- `ee/temporal-workflows/src/workflows/__tests__/ninjaone-token-refresh-workflow.test.ts`
- verifies structured workflow start/success logs contain tenant/integration/schedule context.
- (2026-03-26) Added rollout backfill implementation in Temporal startup schedule bootstrap (`setupSchedules`) that seeds proactive NinjaOne refresh for active integrations without active lifecycle ownership.
- (2026-03-26) Added proactive refresh failure event publication (`INTEGRATION_TOKEN_REFRESH_FAILED`) in proactive path so failures are visible outside worker logs.
## Links / References
- NinjaOne client refresh implementation: `ee/server/src/lib/integrations/ninjaone/ninjaOneClient.ts`
- NinjaOne sync strategy: `ee/server/src/lib/integrations/ninjaone/sync/syncStrategy.ts`
- NinjaOne OAuth callback: `ee/server/src/app/api/integrations/ninjaone/callback/route.ts`
- Temporal schedules bootstrap: `ee/temporal-workflows/src/schedules/setupSchedules.ts`
- Temporal delayed scheduling patterns: `ee/server/src/lib/jobs/runners/TemporalJobRunner.ts`
- NinjaOne proactive refresh plan: `ee/docs/plans/2026-03-26-ninjaone-proactive-token-refresh/PRD.md`
- Official NinjaOne public API docs: `https://www.ninjaone.com/docs/application-programming-interface-api/public-api-operations/`
- Official NinjaOne OAuth configuration docs: `https://www.ninjaone.com/docs/application-programming-interface-api/oauth-token-configuration/`
## Open Questions
- Should lifecycle metadata live in `rmm_integrations.settings` or a dedicated table if we need stronger scheduling introspection later?
- Should a reconnect-required token failure also write `sync_error`, or should it be tracked separately to avoid conflating token lifecycle state with sync state?