PSA/ee/docs/plans/2026-03-10-talos-appliance-gitops-alga-deployment-design.md
Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

3.6 KiB

Talos Appliance GitOps Alga Deployment Design

  • Date: 2026-03-10
  • Status: Approved

Summary

Deploy Alga PSA on the Talos appliance through Flux-managed GitOps instead of direct first-boot Helm commands. The appliance should reconcile a single-node on-prem stack that includes the Alga server, Postgres, PgBouncer, Redis, Hocuspocus, email-service, workflow-worker, Temporal, and temporal-worker. Initial startup must bootstrap the database and run seeds once. Later restarts or Flux reconciliations must reuse existing PVC-backed state and must not reseed the database.

Architecture

Talos first boot owns cluster bootstrap and Flux bootstrap only. Application installation is owned by Flux from an appliance-specific path in this repository.

Namespaces:

  • flux-system: Flux controllers and bootstrap objects.
  • alga-system: appliance coordination objects when needed.
  • msp: Alga PSA runtime workloads, keeping compatibility with the existing Helm defaults.

Release boundaries:

  • temporal: Temporal server/frontend persistence stack.
  • alga-core: root helm/ chart, owning server, Postgres, Redis, Hocuspocus, and bootstrap/migration behavior.
  • pgbouncer: new Kubernetes deployment path for PgBouncer.
  • workflow-worker: existing ee/helm/workflow-worker chart.
  • email-service: existing ee/helm/email-service chart.
  • temporal-worker: existing ee/helm/temporal-worker chart.

Ordering:

  1. Talos boots Kubernetes.
  2. First-boot logic installs Flux and points it at the appliance profile.
  3. Flux reconciles namespaces, repositories, and values ConfigMaps.
  4. Flux reconciles temporal and alga-core.
  5. Flux reconciles pgbouncer, workflow-worker, email-service, and temporal-worker.

Bootstrap And Idempotency

Initial install must be treated as "database not initialized" rather than "Helm release install." The current Compose setup already follows that model through setup/entrypoint.sh.

Desired behavior:

  1. Postgres PVC comes up.
  2. A pre-install/pre-upgrade bootstrap job runs from the setup image.
  3. The bootstrap job creates databases/users idempotently, runs migrations, checks whether seed data already exists, and only runs seeds when the database is still empty.
  4. Application pods start only after the bootstrap job succeeds.

Implications:

  • Migrations are safe to run on upgrades.
  • Seeds are guarded by database state and do not rerun on ordinary restart or release reconciliation.
  • Recreating a Helm release against an existing PVC-backed database is safe because the job rechecks DB state before seeding.

Implementation Shape

Repository additions:

  • ee/appliance/flux/base/
  • ee/appliance/flux/profiles/talos-single-node/
  • ee/appliance/flux/profiles/talos-single-node/values/*.yaml
  • historical removed bootstrap wrapper
  • ee/appliance/scripts/deploy-app.sh
  • ee/helm/pgbouncer/

Key chart changes:

  • Replace the current split migration/seed Helm hook behavior in the root chart with one idempotent bootstrap hook that reuses the setup image and setup script semantics.
  • Preserve generated DB credentials across Helm reconciliations instead of rotating them on reinstall.
  • Add root-chart support for routing the server through PgBouncer while bootstrap still targets direct Postgres.

Validation

Required validation:

  • helm template succeeds for alga-core with the Talos single-node overlay.
  • helm template succeeds for pgbouncer, workflow-worker, email-service, and temporal-worker.
  • Fresh install runs bootstrap once, seeds once, and all core workloads become Ready.
  • Restart or no-op Flux reconciliation does not trigger reseeding and continues to use existing PVC-backed state.