Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
7.7 KiB
7.7 KiB
Ubuntu 24.04 k3s VM Appliance Automation PRD
Metadata
- Status: Draft
- Date: 2026-02-23
- Scope Owner: EE Platform / Deployment
- Plan Folder:
ee/docs/plans/2026-02-23-ubuntu-k3s-vm-appliance/
Problem Statement
Alga PSA needs an enterprise-ready on-prem appliance delivery model that is repeatable, secure, and low-touch for MSP customers. Today there is no standardized automated pipeline that builds and ships VM images with the product pre-integrated into a supported Kubernetes runtime. This causes inconsistent installs, slower onboarding, and risky upgrades.
User Value
- New customers can deploy a known-good appliance image quickly with minimal manual steps.
- Existing customers get a predictable, low-risk upgrade path.
- Alga support can operate a single standardized deployment model across customer sites.
- Enterprise customers can start single-node and later expand to a 3-node HA topology without replatforming.
Goals
- Produce automated, versioned Ubuntu 24.04 LTS VM images for
OVAandQCOW2. - Bootstrap a single-node
k3shost and deploy Alga PSA using existinghelm/chart assets. - Establish a GitOps pull-based app deployment and upgrade flow.
- Maintain an always-current image line for new installs (scheduled rebuilds + release channels).
- Define and support a migration path from single-node to 3-node HA.
Non-Goals
- Supporting multiple Linux base distributions in v1.
- Supporting both
k3sandmicrok8sin v1. - Implementing customer-specific host customization workflows.
- Implementing all HA automation details in v1; v1 requires a validated migration path and scripts.
Target Users / Personas
- Alga Release Engineer
- Builds and publishes appliance artifacts and release bundles.
- MSP Deployment Engineer
- Imports appliance VM, performs site bootstrap, validates app availability.
- Alga Support Engineer
- Executes scripted upgrade and rollback runbooks.
- Enterprise MSP Ops Team
- Starts with single-node deployment and later migrates to 3-node HA.
Current State
- Primary Helm chart exists in
helm/with base values inhelm/values.yaml. - Enterprise assets and automation typically live under
ee/. - No existing
ee/appliance pipeline for Packer image builds or lifecycle runbooks.
Proposed Solution Overview
Create an ee/appliance/ subsystem that owns appliance build, bootstrap, release metadata, and lifecycle scripts.
Core architecture:
- Image Build Layer
Packerwith Ubuntu 24.04 autoinstall/cloud-init to produceOVAandQCOW2.
- Bootstrap Layer
- First-boot automation installs/pins
k3s, installs GitOps controller, and applies cluster base config.
- App Delivery Layer
- GitOps points to release bundle that deploys existing
helm/chart with appliance-specific values overlay.
- Release Layer
- Versioned release manifest with pinned image digests/checksums/signatures for reproducible rollout and rollback.
Decision Defaults (Locked)
- GitOps controller: Flux.
- k3s HA datastore model: embedded etcd.
- Artifact distribution model: hybrid (vendor-hosted default plus customer mirror/offline bundle option).
- Supported release window:
N,N-1,N-2. - Upgrade jump policy: sequential only (
N -> N+1). - Air-gapped support in v1: supported via signed offline release bundle import.
Functional Requirements
FR-1 Image Build and Publication
- Build pipeline outputs
OVAandQCOW2artifacts from a single source template. - Artifacts include release metadata (
sha256, version, build time, component versions). - Artifacts are published to a canonical artifact location and release channel aliases (
stable,candidate).
FR-2 Single-Node k3s Appliance Bootstrap
- Appliance first boot initializes host prerequisites and installs pinned
k3s. - Kubernetes reaches Ready state without manual package-level host changes.
- Appliance installs Flux and begins sync to configured release source.
FR-3 Application Deployment
- Deploy application from root
helm/chart. - Keep environment-specific overrides in appliance-owned values overlays.
- Deploy command path is non-interactive and scriptable for support.
FR-4 Upgrade and Rollback
- Support app-only upgrades through GitOps desired-state update.
- Support planned k3s and host image upgrade tracks with compatibility gates.
- Provide documented rollback paths for app layer and image layer.
- Enforce sequential application upgrade progression (
N -> N+1) with no skipped minors. - Publish and enforce support policy for
N,N-1, andN-2.
FR-5 Always-Current New-Customer Image
- Scheduled rebuild cadence produces a refreshed image line with security updates.
- Latest patch image is install default for new sites.
- Build pipeline prevents stale base image usage beyond policy threshold.
FR-6 Path to 3-Node HA
- Single-node install baseline must preserve compatibility with planned 3-node topology.
- Provide scripted process to add nodes and transition control plane to HA mode using embedded etcd.
- App values and scheduling policies support HA profile.
Public Interfaces / Contracts
- New script entrypoints under
ee/appliance/scripts/:
historical removed image-build scriptpublish-release.shhistorical removed bootstrap wrapperupgrade-site.sh
- Release manifest contract under
historical local release metadata (removed):
releaseVersionos.baseos.artifacts[]k8s.distributionk8s.versionapp.chartPathapp.valuesProfileimages[](digest pinned)upgradeFrom[]
- Appliance values overlays:
ee/appliance/gitops/values/single-node.yamlee/appliance/gitops/values/ha-3node.yaml
Data / Integration Notes
- Helm source chart remains at
helm/. helm/values.yamlremains base defaults; appliance overlays only include necessary deltas.- GitOps repository layout must support per-channel desired state references.
Security and Compliance Requirements
- Do not bake customer secrets into VM image.
- Release artifacts must include checksums and signature verification metadata.
- Runtime image references in Kubernetes manifests must be digest-pinned.
- Upgrade flow must include preflight checks and explicit maintenance windows.
Rollout Plan
- Phase 1: Internal-only build and smoke test.
- Phase 2: Pilot with friendly customer(s), single-node only.
- Phase 3: General availability for single-node appliance.
- Phase 4: Enterprise HA migration support GA.
Risks and Mitigations
- Risk: Image drift or unreproducible builds.
- Mitigation: pinned versions + deterministic build metadata + periodic rebuild policy.
- Risk: Customer site connectivity constraints for GitOps pulls.
- Mitigation: support mirror/offline artifact import mode in release workflow.
- Risk: Upgrade failures at customer sites.
- Mitigation: preflight checks + backup hooks + rollback runbook.
- Risk: HA path complexity from initial single-node choices.
- Mitigation: enforce HA-compatible defaults from v1 and test migration path early.
Open Questions
- Select concrete implementation endpoint names for vendor-hosted artifact storage and mirror import tooling.
- Define exact release cadence per channel (
candidateandstable) and promotion approval gates.
Acceptance Criteria / Definition of Done
- CI can produce
OVAandQCOW2artifacts from mainline code without manual edits. - New VM boots to Ready
k3sand deploys app via GitOps fromhelm/. - App upgrade from release N to N+1 succeeds via supported command path.
- Rollback to previous app release is documented and validated.
- Single-node to 3-node migration runbook is documented and validated in test environment.
- Plan artifacts (
PRD.md,features.json,tests.json,SCRATCHPAD.md) remain synchronized.