Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz Source: /opt/alga-psa on psa.joliet.tech
88 lines
3.5 KiB
Markdown
88 lines
3.5 KiB
Markdown
# Talos Host Configuration
|
|
|
|
## Purpose
|
|
|
|
Talos should be treated as an immutable OS with one durable configuration boundary: machine configuration. If a change must survive reboot, it belongs there.
|
|
|
|
## Persistence Boundary
|
|
|
|
Temporary fixes are not enough for appliance behavior. The following must be expressed in machine configuration rather than one-off boot changes:
|
|
|
|
- network interface selection
|
|
- DHCP or static addressing
|
|
- DNS resolver selection
|
|
- host naming behavior
|
|
- single-node control-plane scheduling policy
|
|
- installer image selection
|
|
|
|
If a setting only exists in maintenance mode, boot media, or an ad hoc runtime patch and is not written into machine configuration, assume it can be lost on reboot or reinstall.
|
|
|
|
## Single-Node Appliance Scheduling
|
|
|
|
For a single-node appliance, workloads must be allowed to run on the control-plane node. The durable setting is:
|
|
|
|
```yaml
|
|
cluster:
|
|
allowSchedulingOnControlPlanes: true
|
|
```
|
|
|
|
This is preferable to repeatedly removing the `node-role.kubernetes.io/control-plane:NoSchedule` taint by hand. Manual untainting is a recovery step, not the desired steady state.
|
|
|
|
## Network Configuration
|
|
|
|
Persistent networking should be expressed with Talos network config documents rather than relying on ephemeral interface choices.
|
|
|
|
Typical durable pieces are:
|
|
|
|
- `DHCPv4Config` or static address config for the intended NIC
|
|
- `ResolverConfig` for non-DHCP resolvers when appliance DNS must be fixed
|
|
- `HostnameConfig` when a stable Talos hostname policy is desired
|
|
|
|
Prefer selectors or deterministic device identification over brittle assumptions when possible. If the appliance depends on a specific interface, make that explicit in machine configuration.
|
|
|
|
## Installer Configuration
|
|
|
|
This Talos-era guidance is historical. Supported Ubuntu/k3s appliances no longer use Talos installer image selection, and release metadata is resolved from OCI artifacts rather than local repository files.
|
|
|
|
Do not hand-edit installer image references without also re-establishing which published appliance release the node now represents.
|
|
|
|
## Boot Media Rule
|
|
|
|
After Talos is installed to disk, subsequent boots must come from the installed disk, not the installer ISO.
|
|
|
|
If the machine is started from installer media again, Talos may halt with the equivalent of:
|
|
|
|
- Talos is already installed to disk
|
|
- the machine booted from another media
|
|
- reboot from disk
|
|
|
|
That is expected behavior. For appliance operations, the steady-state rule is:
|
|
|
|
1. boot from ISO for installation
|
|
2. install Talos to disk
|
|
3. remove or detach the ISO
|
|
4. boot from disk from then on
|
|
|
|
## Storage Assumption
|
|
|
|
The current single-node appliance profile assumes node-local persistent storage. In practice, that means the Kubernetes cluster must have a working `StorageClass` suitable for local PVCs before the application stack is expected to settle.
|
|
|
|
This is a cluster-level dependency, but on a Talos appliance it is effectively part of host bring-up because the application stack relies on it for:
|
|
|
|
- Postgres
|
|
- Redis
|
|
- local file storage
|
|
- optionally Temporal persistence
|
|
|
|
## Operational Guidance
|
|
|
|
When recovering a Talos appliance node, follow this order:
|
|
|
|
1. confirm the node is booting from disk rather than installer media
|
|
2. confirm the intended machine configuration is still applied
|
|
3. confirm network and resolver configuration are present in machine config
|
|
4. confirm single-node scheduling is enabled in config
|
|
5. only then move up to Kubernetes and Flux diagnosis
|
|
|
|
That order avoids spending time on higher-level symptoms caused by a lost host configuration.
|