PSA/docs/superpowers/specs/2026-06-05-appliance-registration-install-flow-design.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

282 lines
16 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Appliance registration → download → install flow
**Status:** Design • **Date:** 2026-06-05
## Goal
Let a customer register for an on-prem (appliance) AlgaPSA install, download a
generic ISO, and have the installed appliance come up already bound to a
**tenant identity that was minted upstream at registration** — so licenses are
bound from first boot and even free (essentials) installs get a real
`tenant_registry` record.
The single organizing idea: the appliance image is **generic** (we don't build a
per-customer ISO), so per-tenant identity travels separately from the download,
carried by a short, one-time **install code** that the appliance redeems at
setup. The code is the per-customer artifact; the ISO is not.
## What exists today (grounded)
The pieces this flow reuses already exist:
- **`tenant_registry`** (`alga-license/migrations/01_tenant_registry.cjs`) — the
global directory: `tenant_id uuid` (minted here, `gen_random_uuid()`),
`edition`, `deployment_type`, `status` (`registered → installed → active …`),
`company_name`/`contact_email`, `stripe_customer_id`. This is the source of
truth for "a tenant exists, who owns it, what edition."
- **`/register`** (`alga-license/src/routes/register.ts`) — redeems a
`claim_code`, mints a per-appliance license JWT bound to a tenant via the `aud`
claim (`signLicense({ …, aud })`), issues an appliance credential, returns
`{ appliance_credential, first_jwt, check_in_url }`. Today it **requires an
entitlement** (`getEntitlementById(row.entitlement_id)` → 500 if none) and
takes `tenant_id` as **caller-supplied input**.
- **`/claim-codes`** (`alga-license/src/routes/claimCodes.ts`) — mints a code for
an entitlement, keyed by `stripe_sub_id` (paid only). Revokes prior unclaimed
codes for that entitlement first (rebind path).
- **`/check-in`** (`alga-license/src/routes/checkIn.ts`) — refreshes the bound
license daily, **preserving `appliance.tenant_id` as `aud`** across re-signs.
No change needed.
- **`claim_codes`** (`alga-license/migrations/03_claim_codes.cjs`) —
`entitlement_id` is `NOT NULL` with an FK to `entitlements`.
- **Appliance tenant creation** (`ee/server/src/lib/testing/tenant-creation.ts`,
used by `server/scripts/create-tenant.ts`) — `createTenant` inserts into
`tenants` **without** a `tenant` value, so the DB generates the UUID
(`gen_random_uuid()`) and returns it (`.returning('tenant')`, line 8084).
- **Appliance consumer** (`server/src/lib/actions/licenseManagementActions.ts`) —
`connectAppliance` POSTs `{ claim_code, appliance_id }` to `/register` and
seeds `license_state` with `first_jwt` / `appliance_credential` /
`check_in_url`. Runs **after** the tenant already exists.
### The gap
Today the appliance **generates its own tenant UUID** at setup, then *optionally*
connects a license later. The tenant identity is never known upstream, so a
license issued before install can't be bound, and essentials installs leave no
registry record. The flow below inverts the direction: **mint `tenant_id`
upstream at registration; the install code carries it downstream; the appliance
adopts it.**
## The four surfaces
```
nm-store alga-license (C4) object store appliance
──────── ───────────────── ──────────── ─────────
register ──POST───► /register-tenant
form • tenant_registry row (tenant_id)
• (paid) entitlement
• mint install code ──carries──► tenant_id
• presign ISO URL ◄──────────────┐
◄──{code, tenant_id, download_url} │
confirm page │
+ email │
download ──────────────────────presigned GET────────────┘──────────► ISO
install: setup UI "enter install code" ▼
──POST {claim_code, appliance_id}──► /register
• look up code → tenant_id (registry-minted)
• (paid) mint license bound aud=tenant_id
◄──{ tenant_id, edition, first_jwt?, credential?, check_in_url? }
create-tenant INITIAL_TENANT_ID=tenant_id ◄─────────────────┘
seed license_state (edition; + token/credential if paid)
registry status → installed
```
1. **Registration (nm-store → alga-license).** A new service-authed endpoint
`POST /register-tenant` on alga-license creates the `tenant_registry` row
(`deployment_type=appliance`, `status=registered`), and for **paid** tiers
creates the entitlement (Stripe checkout drives this, as the hosted flow does
today). It mints an **install code** carrying that `tenant_id` (and the
`entitlement_id` for paid), and returns `{ tenant_id, install_code,
download_url }`. nm-store shows the code on the confirmation page and emails
it. nm-store stays DB-less — alga-license owns the registry write, the code,
and the presigned URL.
2. **Download (presigned, all tiers).** `download_url` is a time-boxed presigned
URL to the **current generic appliance ISO** in the in-cluster object store
(MinIO). Presigned for everyone — essentials included — so downloads are
gated by registration, not public. The ISO carries no identity.
3. **Install (appliance redeems the code).** The first-boot setup UI gains an
**"enter install code"** step. The host-service redeems it via the existing
`/register`, which now returns the registry-minted `tenant_id` and `edition`
(plus, for paid, `first_jwt` / `appliance_credential` / `check_in_url`).
Bootstrap then runs `create-tenant` with a new **`INITIAL_TENANT_ID`** so the
local `tenants` row is created under the pre-minted UUID, and seeds
`license_state` (edition always; license token + credential + check-in URL for
paid). alga-license flips the registry row to `status=installed`.
4. **Re-issue / recovery (in scope).** Install codes are one-time. A new
service-authed `POST /install-codes/reissue` on alga-license resolves an
existing registry tenant (by `contact_email`, or `tenant_id`), revokes any
unconsumed codes for it, and mints a **fresh install code for the same
`tenant_id`** (+ current entitlement) with a fresh presigned `download_url`.
nm-store exposes this as a portal "re-issue install code" action. This is what
makes a wipe-and-reinstall recover the *same* tenant identity rather than
stranding it.
## Data model changes (alga-license)
A single migration extending `claim_codes` so one code type serves both paid and
essentials, carrying the registry tenant:
```
ALTER claim_codes:
entitlement_id → NULLABLE -- essentials codes carry no entitlement
+ tenant_id uuid NOT NULL FK → tenant_registry(tenant_id)
+ index on tenant_id
```
- **Paid code:** `entitlement_id` set (license path unchanged) **and** `tenant_id`
set (so `aud` comes from the registry, not appliance input).
- **Essentials code:** `entitlement_id` null, `tenant_id` set. No license minted.
`tenant_registry` already has everything else needed (`edition`, `company_name`,
`contact_email`, `status`); no change there beyond writing `status=installed` at
install.
## API changes (alga-license)
**`/register` (extend, don't replace).** Today it 500s when the code has no
entitlement and binds `aud` from caller-supplied `tenant_id`. New behavior:
- Resolve `tenant_id` from **the code** (`row.tenant_id`), not the request body —
this is the registry-minted identity. (Keep accepting a body `tenant_id` only
as a legacy fallback for pre-registry appliances; the registry path ignores
it.)
- If `row.entitlement_id` is null → **essentials**: skip license minting; still
upsert the appliance row (credential, `tenant_id`, no token).
- Look up `tenant_registry` by `tenant_id` for `edition` + `company_name` /
`contact_email`.
- **Response gains** `tenant_id`, `edition`, and (for essentials) omits
`first_jwt`. So `RegisterResponse` becomes:
`{ tenant_id, edition, company_name, contact_email, appliance_credential?,
first_jwt?, check_in_url? }` — credential/token/url present only when there's a
license (paid).
**`POST /register-tenant` (new, service-authed).** Body: company/contact,
edition, deployment_type, optional Stripe linkage. Creates the registry row,
(paid) entitlement, mints the install code carrying `tenant_id`, presigns the
ISO, returns `{ tenant_id, install_code, download_url }`. Reuses
`createRegistryTenant` + the claim-code minting (generalized to accept a
`tenant_id` and optional `entitlement_id`).
**`POST /install-codes/reissue` (new, service-authed).** Body:
`{ contact_email | tenant_id }`. Resolves the registry tenant, revokes its
unconsumed codes, mints a fresh code + presigned `download_url`. Returns
`{ install_code, download_url }`.
## The `INITIAL_TENANT_ID` seam (appliance)
The change is small and localized at the one place the UUID is born:
- `createTenant` (`tenant-creation.ts`) gains an optional `tenantId` on its input;
when present, set `tenantInsert.tenant = input.tenantId` instead of letting the
DB default it. Thread the option through `createTenantComplete`.
- `server/scripts/create-tenant.ts` reads `INITIAL_TENANT_ID` (env, alongside the
existing `INITIAL_ADMIN_PASSWORD`) and passes it down.
- The appliance bootstrap (the configmap/script that runs `create-tenant` at
first boot) sets `INITIAL_TENANT_ID` to the `tenant_id` the redeem returned.
When `INITIAL_TENANT_ID` is unset, behavior is unchanged (DB-generated UUID) — so
this is additive and safe for non-registry installs.
## The install-time redeem consumer (appliance)
`connectAppliance` already knows how to call `/register` and seed `license_state`.
The install path generalizes it:
1. Setup UI collects the install code + admin password (the password is set here,
at install — it never travels through registration or email).
2. Host-service POSTs `{ claim_code, appliance_id }` to `/register`, receives
`tenant_id` + `edition` (+ license bits if paid).
3. Bootstrap runs `create-tenant` with `INITIAL_TENANT_ID = tenant_id`.
4. Seed `license_state`: `edition` always; for paid also `license_token`
(`first_jwt`), `appliance_credential`, `check_in_url` (the existing
`connectAppliance` write, minus the assumption that a token is always present).
Essentials installs simply run at `edition=essentials` with a `license_state` row
and no token — consistent with "always seed `license_state`" (the appliance reads
edition from `license_state`, not from the presence of a token).
## Error handling
- **Code invalid / expired / already consumed** — `/register` already returns
`invalid_claim_code` / `expired_claim_code` / `consumed_claim_code`; the setup
UI surfaces these and points the user to portal **re-issue**.
- **Consumed code on reinstall** — expected; the user re-issues. Re-issue revokes
stragglers so only the newest code is live.
- **Presigned URL expired** — re-issue mints a fresh one; the download link and
the code travel together.
- **Registry/license-service unreachable at install** — setup blocks with a clear
"can't reach license service" error (the appliance can't self-mint a registry
identity; that's by design).
- **Idempotent install** — if `create-tenant` reruns with the same
`INITIAL_TENANT_ID`, it must no-op/short-circuit rather than duplicate
(the existing admin-user create already guards on `(email, tenant)`).
## Security considerations
- The install code is **single-use** (`consumeClaimCode` is atomic under
concurrency) and short-lived (`claimCodeTtlSeconds`); re-issue is the only way
to get another, and it's behind portal auth on nm-store.
- Per-tenant binding is preserved end-to-end: the registry mints `tenant_id`,
the code carries it, `/register` stamps it as `aud`, `/check-in` keeps it across
refreshes — so a leaked license still can't activate on another tenant
(the binding work already shipped; this flow just sources `aud` from the
registry instead of appliance input).
- Presigned-for-all means the ISO isn't a public URL, but note the ISO itself is
**not** a secret — the install code is the gate. Presigning is registration
gating + link expiry, not ISO confidentiality.
- The admin password is set at install, never in registration/email.
## Testing
Following the repo's light-automated-then-smoke convention:
- **Automated (alga-license):** the `claim_codes` extension + `/register`
essentials path + `/register-tenant` + `/install-codes/reissue` get db-layer
and route tests against a throwaway Postgres (the existing pattern: gated on
`DB_HOST`, `DELETE` not `TRUNCATE` for the least-priv role). Cover: essentials
code (no entitlement) redeems and returns `tenant_id`/`edition` with no token;
paid code returns a license bound to the registry `tenant_id`; reissue revokes
prior codes and reuses the same `tenant_id`.
- **Automated (appliance):** `createTenant` honors `INITIAL_TENANT_ID`
(creates the `tenants` row under the supplied UUID; unchanged when unset).
- **Smoke (live, on the VM):** full loop — register on nm-store → download →
enter code in setup UI → confirm the appliance comes up under the minted
`tenant_id`, `license_state` seeded, paid tier licensed/bound, then a
wipe-and-reinstall via **re-issue** recovers the same tenant.
## Phasing
The four decisions collapsed the original two-phase split: re-issue is **in
scope now**, so this is one coherent build rather than essentials-first then
paid. Suggested build order (each independently verifiable):
1. **alga-license schema + `/register` extension** (essentials path + tenant_id
from the code + richer response). The seam everything hangs off.
2. **`INITIAL_TENANT_ID`** in `createTenant` / `create-tenant.ts` / bootstrap.
3. **`/register-tenant`** + **`/install-codes/reissue`** + presigned-URL minting.
4. **Setup-UI install-code step** + install-time redeem/seed consumer.
5. **nm-store** registration + confirmation/email + portal re-issue.
## Open questions (for spec review)
1. **Presigned URL owner.** Lean: alga-license mints the presigned URL (holds the
MinIO creds; nm-store stays DB-less and just relays). Confirm MinIO is the ISO
store and the appliance release process publishes the "current" ISO to a known
key. Current appliance release metadata is published by the `~/nm-kube-config`
Argo workflow to OCI, not from local files in `alga-psa`.
2. **`/register-tenant` vs Stripe ordering for paid.** Does nm-store call
`/register-tenant` before or after Stripe checkout completes? Lean: create the
registry row at form submit (`status=registered`), attach the entitlement on
`checkout.session.completed`, mint the code at that point.
3. **Edition source for essentials at `/register`.** Confirmed via
`tenant_registry.edition` (the code → tenant_id → registry lookup); no edition
on the code itself. OK?
4. **Code format.** Reuse `generateClaimCode()` as-is (same friendly format), or a
distinct visual format for install codes? Lean: reuse — one machinery.
```