Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
159 lines
7.7 KiB
Markdown
159 lines
7.7 KiB
Markdown
# Ubuntu 24.04 k3s VM Appliance Automation PRD
|
|
|
|
## Metadata
|
|
- Status: Draft
|
|
- Date: 2026-02-23
|
|
- Scope Owner: EE Platform / Deployment
|
|
- Plan Folder: `ee/docs/plans/2026-02-23-ubuntu-k3s-vm-appliance/`
|
|
|
|
## Problem Statement
|
|
Alga PSA needs an enterprise-ready on-prem appliance delivery model that is repeatable, secure, and low-touch for MSP customers. Today there is no standardized automated pipeline that builds and ships VM images with the product pre-integrated into a supported Kubernetes runtime. This causes inconsistent installs, slower onboarding, and risky upgrades.
|
|
|
|
## User Value
|
|
1. New customers can deploy a known-good appliance image quickly with minimal manual steps.
|
|
2. Existing customers get a predictable, low-risk upgrade path.
|
|
3. Alga support can operate a single standardized deployment model across customer sites.
|
|
4. Enterprise customers can start single-node and later expand to a 3-node HA topology without replatforming.
|
|
|
|
## Goals
|
|
1. Produce automated, versioned Ubuntu 24.04 LTS VM images for `OVA` and `QCOW2`.
|
|
2. Bootstrap a single-node `k3s` host and deploy Alga PSA using existing `helm/` chart assets.
|
|
3. Establish a GitOps pull-based app deployment and upgrade flow.
|
|
4. Maintain an always-current image line for new installs (scheduled rebuilds + release channels).
|
|
5. Define and support a migration path from single-node to 3-node HA.
|
|
|
|
## Non-Goals
|
|
1. Supporting multiple Linux base distributions in v1.
|
|
2. Supporting both `k3s` and `microk8s` in v1.
|
|
3. Implementing customer-specific host customization workflows.
|
|
4. Implementing all HA automation details in v1; v1 requires a validated migration path and scripts.
|
|
|
|
## Target Users / Personas
|
|
1. Alga Release Engineer
|
|
- Builds and publishes appliance artifacts and release bundles.
|
|
2. MSP Deployment Engineer
|
|
- Imports appliance VM, performs site bootstrap, validates app availability.
|
|
3. Alga Support Engineer
|
|
- Executes scripted upgrade and rollback runbooks.
|
|
4. Enterprise MSP Ops Team
|
|
- Starts with single-node deployment and later migrates to 3-node HA.
|
|
|
|
## Current State
|
|
1. Primary Helm chart exists in `helm/` with base values in `helm/values.yaml`.
|
|
2. Enterprise assets and automation typically live under `ee/`.
|
|
3. No existing `ee/` appliance pipeline for Packer image builds or lifecycle runbooks.
|
|
|
|
## Proposed Solution Overview
|
|
Create an `ee/appliance/` subsystem that owns appliance build, bootstrap, release metadata, and lifecycle scripts.
|
|
|
|
Core architecture:
|
|
1. Image Build Layer
|
|
- `Packer` with Ubuntu 24.04 autoinstall/cloud-init to produce `OVA` and `QCOW2`.
|
|
2. Bootstrap Layer
|
|
- First-boot automation installs/pins `k3s`, installs GitOps controller, and applies cluster base config.
|
|
3. App Delivery Layer
|
|
- GitOps points to release bundle that deploys existing `helm/` chart with appliance-specific values overlay.
|
|
4. Release Layer
|
|
- Versioned release manifest with pinned image digests/checksums/signatures for reproducible rollout and rollback.
|
|
|
|
## Decision Defaults (Locked)
|
|
1. GitOps controller: Flux.
|
|
2. k3s HA datastore model: embedded etcd.
|
|
3. Artifact distribution model: hybrid (vendor-hosted default plus customer mirror/offline bundle option).
|
|
4. Supported release window: `N`, `N-1`, `N-2`.
|
|
5. Upgrade jump policy: sequential only (`N -> N+1`).
|
|
6. Air-gapped support in v1: supported via signed offline release bundle import.
|
|
|
|
## Functional Requirements
|
|
|
|
### FR-1 Image Build and Publication
|
|
1. Build pipeline outputs `OVA` and `QCOW2` artifacts from a single source template.
|
|
2. Artifacts include release metadata (`sha256`, version, build time, component versions).
|
|
3. Artifacts are published to a canonical artifact location and release channel aliases (`stable`, `candidate`).
|
|
|
|
### FR-2 Single-Node k3s Appliance Bootstrap
|
|
1. Appliance first boot initializes host prerequisites and installs pinned `k3s`.
|
|
2. Kubernetes reaches Ready state without manual package-level host changes.
|
|
3. Appliance installs Flux and begins sync to configured release source.
|
|
|
|
### FR-3 Application Deployment
|
|
1. Deploy application from root `helm/` chart.
|
|
2. Keep environment-specific overrides in appliance-owned values overlays.
|
|
3. Deploy command path is non-interactive and scriptable for support.
|
|
|
|
### FR-4 Upgrade and Rollback
|
|
1. Support app-only upgrades through GitOps desired-state update.
|
|
2. Support planned k3s and host image upgrade tracks with compatibility gates.
|
|
3. Provide documented rollback paths for app layer and image layer.
|
|
4. Enforce sequential application upgrade progression (`N -> N+1`) with no skipped minors.
|
|
5. Publish and enforce support policy for `N`, `N-1`, and `N-2`.
|
|
|
|
### FR-5 Always-Current New-Customer Image
|
|
1. Scheduled rebuild cadence produces a refreshed image line with security updates.
|
|
2. Latest patch image is install default for new sites.
|
|
3. Build pipeline prevents stale base image usage beyond policy threshold.
|
|
|
|
### FR-6 Path to 3-Node HA
|
|
1. Single-node install baseline must preserve compatibility with planned 3-node topology.
|
|
2. Provide scripted process to add nodes and transition control plane to HA mode using embedded etcd.
|
|
3. App values and scheduling policies support HA profile.
|
|
|
|
## Public Interfaces / Contracts
|
|
1. New script entrypoints under `ee/appliance/scripts/`:
|
|
- `historical removed image-build script`
|
|
- `publish-release.sh`
|
|
- `historical removed bootstrap wrapper`
|
|
- `upgrade-site.sh`
|
|
2. Release manifest contract under `historical local release metadata (removed)`:
|
|
- `releaseVersion`
|
|
- `os.base`
|
|
- `os.artifacts[]`
|
|
- `k8s.distribution`
|
|
- `k8s.version`
|
|
- `app.chartPath`
|
|
- `app.valuesProfile`
|
|
- `images[]` (digest pinned)
|
|
- `upgradeFrom[]`
|
|
3. Appliance values overlays:
|
|
- `ee/appliance/gitops/values/single-node.yaml`
|
|
- `ee/appliance/gitops/values/ha-3node.yaml`
|
|
|
|
## Data / Integration Notes
|
|
1. Helm source chart remains at `helm/`.
|
|
2. `helm/values.yaml` remains base defaults; appliance overlays only include necessary deltas.
|
|
3. GitOps repository layout must support per-channel desired state references.
|
|
|
|
## Security and Compliance Requirements
|
|
1. Do not bake customer secrets into VM image.
|
|
2. Release artifacts must include checksums and signature verification metadata.
|
|
3. Runtime image references in Kubernetes manifests must be digest-pinned.
|
|
4. Upgrade flow must include preflight checks and explicit maintenance windows.
|
|
|
|
## Rollout Plan
|
|
1. Phase 1: Internal-only build and smoke test.
|
|
2. Phase 2: Pilot with friendly customer(s), single-node only.
|
|
3. Phase 3: General availability for single-node appliance.
|
|
4. Phase 4: Enterprise HA migration support GA.
|
|
|
|
## Risks and Mitigations
|
|
1. Risk: Image drift or unreproducible builds.
|
|
- Mitigation: pinned versions + deterministic build metadata + periodic rebuild policy.
|
|
2. Risk: Customer site connectivity constraints for GitOps pulls.
|
|
- Mitigation: support mirror/offline artifact import mode in release workflow.
|
|
3. Risk: Upgrade failures at customer sites.
|
|
- Mitigation: preflight checks + backup hooks + rollback runbook.
|
|
4. Risk: HA path complexity from initial single-node choices.
|
|
- Mitigation: enforce HA-compatible defaults from v1 and test migration path early.
|
|
|
|
## Open Questions
|
|
1. Select concrete implementation endpoint names for vendor-hosted artifact storage and mirror import tooling.
|
|
2. Define exact release cadence per channel (`candidate` and `stable`) and promotion approval gates.
|
|
|
|
## Acceptance Criteria / Definition of Done
|
|
1. CI can produce `OVA` and `QCOW2` artifacts from mainline code without manual edits.
|
|
2. New VM boots to Ready `k3s` and deploys app via GitOps from `helm/`.
|
|
3. App upgrade from release N to N+1 succeeds via supported command path.
|
|
4. Rollback to previous app release is documented and validated.
|
|
5. Single-node to 3-node migration runbook is documented and validated in test environment.
|
|
6. Plan artifacts (`PRD.md`, `features.json`, `tests.json`, `SCRATCHPAD.md`) remain synchronized.
|