PSA/ee/docs/appliance/architecture.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

15 KiB

Appliance Architecture & Codebase Map

Use this to get oriented in the on-prem appliance and find where each part lives: the components and how they fit together, what happens during an install, and the file to open for any given piece. Every section links the real files.

This describes the current Ubuntu + k3s + Kubernetes-hosted control plane + registry-metadata implementation. The user-facing operator docs in this directory (quick-start.md, operators-manual.md, technical-reference.md, kubernetes-hosted-setup.md) and the generic Talos platform references under ee/docs/premise/ carry more detail and some historical (Talos) narrative. The design rationale for the registry-metadata release model is in ee/appliance/docs/registry-metadata-design.md.

All paths are relative to the repo root. Everything for the appliance lives under ee/appliance/; its own README.md is the package-level companion to this doc.


1. The big picture

The appliance is a single-node, self-contained on-prem deployment of Alga PSA. It ships as an Ubuntu ISO that boots into k3s and runs its entire control plane as Kubernetes workloads. The only host-level responsibility is a thin systemd bootstrap that starts k3s and hands off to an in-cluster setup UI.

 Ubuntu ISO (autoinstall)
   └─ systemd: alga-appliance-bootstrap.service
        ├─ starts k3s (the substrate)
        ├─ imports the baked control-plane image
        └─ applies the control-plane Deployment, then prints the setup banner
              │
              ▼
 Kubernetes-hosted control plane  (ns: alga-appliance-control-plane)
   Deployment appliance-control-plane  →  host-service/server.mjs on :8080
        ├─ serves the setup wizard + status UI
        ├─ runs the setup workflow (setup-engine.mjs)
        └─ writes Flux resources that install the app
              │
              ▼
 Flux (GitOps) reconciles from the OCI registry (ghcr.io/nine-minds)
   OCIRepository (config bundle) + HelmRepository (charts)
        └─ Kustomizations: alga-platform → alga-core → alga-background
              │
              ▼
 PSA application stack (Helm releases)
   ns msp         : alga-core (sebastian), db, redis
   ns alga-system : pgbouncer, temporal, temporal-worker, workflow-worker, email-service

Four namespaces carry the whole system:

Namespace Holds
alga-appliance-control-plane the setup/status control-plane pod
flux-system Flux controllers + the child Kustomizations + the OCI/Helm sources
alga-system platform HelmReleases (pgbouncer, temporal, workers, email)
msp the PSA app (alga-core / sebastian), database, redis, initial tenant

2. Components & where they live

Host bootstrap (systemd)

The host does as little as possible. See ee/appliance/systemd/:

  • alga-appliance-bootstrap.service — oneshot. ExecStartPre runs init-token.mjs + init-admin-credential.mjs; ExecStart runs scripts/bootstrap-control-plane.sh; ExecStartPost runs console.mjs --emit-runtime-banner (prints the setup URL + token on tty1).
  • alga-host-agent.service — long-running. Runs host-service/host-agent.mjs, a diagnostics Unix socket the in-pod control plane calls for host journals.
  • alga-appliance.sysusers — reserves the alga group (GID 10001) used for the host-agent socket.

Note: the old script-driven Talos bootstrap/upgrade path has been removed from ee/appliance; supported installs use the Ubuntu host setup/status service.

The Kubernetes-hosted control plane

The setup/status brain. See ee/appliance/control-plane/ and ee/appliance/host-service/:

  • control-plane/Dockerfile — multi-stage: builds the Next.js status UI from status-ui/, then a runtime image with bash/curl/kubectl/flux that bundles the host-service/*.mjs, manifests/, and flux/ trees. Runs as UID 10001. (It intentionally does not bake release metadata — setup/update resolve selected channels from the OCI artifact registry at runtime.)
  • control-plane/manifests/namespace.yaml, rbac.yaml, workload.yaml (the appliance-control-plane Deployment: 1 replica, hostNetwork, hostPort 8080, mounts the host state dir + setup-token secret), and kustomization.yaml.
  • scripts/control-plane-entrypoint.sh — the pod's entrypoint: builds an in-cluster kubeconfig from the service-account token, then runs server.mjs.

host-service — the runtime (ee/appliance/host-service/)

Plain Node ESM (.mjs), no build step. The pieces:

File Role
server.mjs HTTP server on :8080. Routes: /healthz, /setup (UI), /api/setup/config, /api/setup (POST), /api/status. Serves the static status UI, validates input, and dispatches the setup workflow.
setup-engine.mjs The setup workflow (see §3). Resolves the release from the registry, renders runtime values, writes the initial-tenant Secret + release-selection ConfigMap, and applies the Flux source.
update-engine.mjs Channel upgrades: re-resolve the channel, update release selection, flux reconcile.
status-engine.mjs Cluster health: HelmReleases, pods, readiness tiers (platform / core / bootstrap / login). Backs /api/status.
console.mjs Console (tty1) setup banner + fallback flow; writes /etc/issue and /etc/motd.
host-agent.mjs The diagnostics socket daemon (host side of alga-host-agent.service).
kubectl-queue.mjs Serializes kubectl calls (one at a time, with timeout/output caps).
metadata-engine.mjs Persists maintenance metadata (OS info, bootstrap/upgrade history).
init-token.mjs / init-admin-credential.mjs Generate the setup token and the temporary console admin password on first boot.
resolve-control-plane-image.mjs Prints the channel-pinned control-plane image ref from the release manifest (used by bootstrap-control-plane.sh to roll the control plane from the registry).
bootstrap-boundary.mjs Phase constants for the host bootstrap.
support-bundle.mjs Builds a redacted diagnostics tarball.
tests/ Node test runner specs for the engines (workflow, update, status, control-plane packaging, first-boot smoke, etc.).

The setup UI (ee/appliance/status-ui/)

A small Next.js app. app/setup/page.tsx is the wizard (channel, app hostname, DNS, tenant + admin); app/page.tsx is the status view. It's built into static dist/ and served by the control-plane pod (the Dockerfile copies it to ALGA_APPLIANCE_STATUS_UI_DIR).

Flux / Helm layer (ee/appliance/flux/)

GitOps definitions, published to the registry as the config bundle:

  • flux/base/kustomization.yaml — top of the bundle: namespaces + the charts HelmRepository + the child Kustomizations.
  • flux/base/charts/helm-repository.yamlHelmRepository alga-charts (type: oci, oci://ghcr.io/nine-minds/charts).
  • flux/base/flux/kustomizations.yaml — the dependency chain alga-platformalga-corealga-background, all sourced from the OCIRepository alga-appliance that the setup workflow creates.
  • flux/base/releases/*.yaml — one HelmRelease each: alga-core (chart sebastian, ns msp), pgbouncer, temporal, temporal-worker, workflow-worker, email-service (ns alga-system). Each pulls values from a generated ConfigMap.
  • flux/base/platform/appliance-status.yaml — in-cluster RBAC + status helpers the control plane queries (uses the Flux v1 APIs).
  • flux/profiles/single-node/values/*.single-node.yaml — the per-service value overrides baked into the release manifest's profileValues.

Flux apiVersions: the appliance's Flux serves source CRDs (OCIRepository/HelmRepository/GitRepository) + Kustomization at v1 and HelmRelease at v2. Don't reintroduce v1beta2.

Release metadata and publishing

  • Release metadata is an OCI artifact at ghcr.io/nine-minds/alga-appliance-release.
  • The release manifest JSON is the artifact config blob: app image tags, chart versions, Flux config-bundle digest, control-plane image ref, values profile, and profile values.
  • Channels such as stable and nightly are OCI tags on that release artifact.
  • Publishing is owned by the Argo workflow in ~/nm-kube-config/alga-psa/workflows/composite/alga-psa-build-migrate-deploy.yaml. For stable releases use promote-release=true, publish-appliance-release=true, and appliance-release-channel=stable.

Host/control-plane scripts (ee/appliance/scripts/)

  • bootstrap-control-plane.sh — the host bootstrap: start k3s, import the baked image, resolve + pull the channel control-plane image, apply the Deployment, report handoff.
  • stage-control-plane-bundle.sh — builds + stages the baked control-plane image archive into the ISO overlay.
  • install-storage.sh / manifests/local-path-storage.yaml — local-path provisioner.
  • bin/alga-control-plane-reapply — non-destructive recovery: re-import + re-apply the control plane.

ISO builder (ee/appliance/ubuntu-iso/)

  • config/nocloud/user-data — the autoinstall seed. Identity alga-admin, interactive-sections: [network, storage] (the installer pauses on those two), late-commands copy the overlay into /target and enable the systemd units, then reboot.
  • overlay/ — the host filesystem injected into the install: everything lands at /opt/alga-appliance/ (scripts, host-service, flux, manifests, status UI), plus the systemd units and /etc/alga-appliance/.
  • scripts/build-ubuntu-appliance-iso.sh — entrypoint (--base-iso, --release-version). Calls stage-host-artifacts.sh (which builds the control-plane image + stages the overlay), then remasters with xorriso.
  • output/ — the built alga-appliance-ubuntu-<version>.iso + .sha256.

Operator CLI (ee/appliance/operator/)

ee/appliance/applianceoperator/lib/cli.mjs: a TUI/CLI for status, support bundles, repair helpers, and destructive reset. It no longer exposes the removed bootstrap/upgrade commands; installs and app-channel updates are handled by the host/control-plane setup and update engines.


3. How an install happens (end to end)

  1. Boot. The ISO autoinstalls Ubuntu (operator confirms the network + storage screens), copies the overlay, enables the systemd units, and reboots. → ubuntu-iso/config/nocloud/user-data
  2. Host bootstrap. alga-appliance-bootstrap.service generates the setup token + temp admin password, runs bootstrap-control-plane.sh (k3s up, import baked image, resolve the channel control-plane image from the registry and roll to it, apply the Deployment), then prints the setup URL + token on the console. → systemd/, scripts/bootstrap-control-plane.sh, host-service/resolve-control-plane-image.mjs
  3. Operator opens the setup wizard at http://<ip>:8080/setup?token=… and submits channel (stable), app hostname, DNS, tenant + admin. → status-ui/app/setup/page.tsx, host-service/server.mjs
  4. Setup workflow (setup-engine.mjs, orchestrated by runSetupWorkflow):
    • validateSetupInputsrunSetupPreflight / runNetworkChecks
    • resolveChannelMetadataresolveReleaseManifest resolves the channel tag on ghcr.io/nine-minds/alga-appliance-release to an immutable manifest (image tags + chart versions + config-bundle digest + control-plane ref + profile values), validated by validateReleaseManifest.
    • applyReleaseSelectionConfiguration + applyRuntimeValuesAndReleaseSelection write the per-service values ConfigMaps, the appliance-release-selection ConfigMap, and the initial-tenant Secret (tenant + admin, PBKDF2-hashed).
    • applyFluxSource creates the OCIRepository (config bundle, pinned by digest) + the parent Kustomization.
  5. Flux reconciles. alga-platform → alga-core → alga-background apply the HelmReleases; charts come from the OCI HelmRepository. The app comes up in msp + alga-system. → flux/base/
  6. Tenant bootstrap. The app's bootstrap job runs create-tenant.ts against the initial-tenant Secret, creating the tenant + admin and seeding roles/permissions/statuses. Login then works. → helm/templates/appliance-bootstrap-configmap.yaml, server/scripts/create-tenant.ts
  7. Status is reported throughout by status-engine.mjs via /api/status (readiness tiers: platform / core / bootstrap / login).

4. The registry-metadata model (why no git, no ISO re-burn)

A channel (stable) is an OCI tag on a release manifest; everything it references is pinned by digest/version, so a resolved release is immutable. The appliance pulls all of it anonymously from ghcr.io/nine-minds (public read).

Published artifacts (all under ghcr.io/nine-minds):

Artifact What
charts/<name>:<ver> the 6 Helm charts (OCI)
alga-appliance-config:<ver> the Flux config bundle (the rendered flux/base tree)
alga-appliance-control-plane:<ver> the control-plane image
alga-appliance-release:<ver> + :<channel> the release manifest (its JSON is the artifact's config blob)

Consequences for what needs rebuilding:

  • App image change (e.g. a fix in server/) → build alga-psa-ee, then run the ~/nm-kube-config Argo workflow with publish-appliance-release=true and the desired channel (for stable: appliance-release-channel=stable). No ISO, no control-plane rebuild.
  • Control-plane / setup-UI / engine change → rebuild + publish the control-plane image; the next boot rolls to it from the registry. No ISO.
  • ISO re-burn only for: k3s / base OS / Flux controllers / systemd units / the autoinstall seed / the baked baseline control-plane image.

See ee/appliance/docs/registry-metadata-design.md for the full contract (publish side, consume side, decisions).


5. Quick file index

I need to… Look at
Understand the package ee/appliance/README.md
See the boot path ee/appliance/systemd/, scripts/bootstrap-control-plane.sh
Change the setup wizard ee/appliance/status-ui/app/setup/page.tsx
Change the setup API / flow ee/appliance/host-service/server.mjs, setup-engine.mjs
Change what the app stack is / Helm values ee/appliance/flux/base/, flux/profiles/single-node/values/
Pin a new app image ~/nm-kube-config/alga-psa/workflows/composite/alga-psa-build-migrate-deploy.yaml (appliance-release-source-ref / appliance-release-version)
Publish a release to ghcr Argo workflow with publish-appliance-release=true
Rebuild the control-plane image ee/appliance/control-plane/Dockerfile
Build the ISO ee/appliance/ubuntu-iso/scripts/build-ubuntu-appliance-iso.sh
Diagnose a running box ee/appliance/host-service/status-engine.mjs, operator/, bin/alga-control-plane-reapply
Understand the release model ee/appliance/docs/registry-metadata-design.md