Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

10 KiB

Scratchpad — Appliance Operator TUI

  • Plan slug: appliance-operator-tui
  • Created: 2026-03-25

What This Is

Keep a lightweight, continuously-updated log of discoveries and decisions made while implementing this plan.

Prefer short bullets. Append new entries as you learn things, and also update earlier notes when a decision changes or an open question is resolved.

Decisions

  • (2026-03-25) First version is terminal-first, not a browser-based operator console. Reason: it matches the current appliance operator workflow, works over SSH, and avoids building a second management surface before the appliance lifecycle is stable.
  • (2026-03-25) The new tool should live under ee/appliance, not inside the existing Nushell developer CLI. Reason: the current CLI is developer-oriented and appliance operations need a clearer product boundary.
  • (2026-03-25) The tool should be structured for both repo-based use and future standalone packaging. Reason: v1 can ship from the repo, but packaging constraints should not be baked into the lifecycle logic.
  • (2026-03-25) The TUI should wrap the current appliance shell scripts and release manifests instead of replacing bootstrap, upgrade, or reset logic.
  • (2026-03-25) Implemented operator as Node ESM modules under ee/appliance/operator with a thin shell wrapper ee/appliance/appliance. Reason: no existing appliance package/workspace existed and ESM keeps packaging and standalone embedding simple.
  • (2026-03-25) Added runtime path abstraction with repo auto-discovery and ALGA_APPLIANCE_ASSET_ROOT override. Reason: required for future standalone packaging while preserving repo-hosted workflow.
  • (2026-03-25) Implemented one normalized status model (collectStatus) used by both TUI and non-interactive commands. Reason: avoids divergent status logic and supports consistent blocker guidance across command surfaces.
  • (2026-03-25) Kept lifecycle actions script-driven (historical removed bootstrap script, historical removed upgrade script, reset-appliance-data.sh, collect-support-bundle.sh) with phase-aware progress wrappers. Reason: minimizes operational drift and honors existing script contracts.
  • (2026-03-25) The current readline shell is not the accepted final UX. Reason: it is operationally useful, but it does not meet the product bar for a real operator TUI.
  • (2026-03-25) Ink is the intended runtime for the interactive layer. Reason: it supports the full-screen, persistent-layout, keyboard-driven interface we actually want while allowing the existing Node operator core to remain intact.
  • (2026-03-25) Replaced the interactive readline/promises loop with a stateful Ink app while keeping lifecycle/status modules untouched. Reason: satisfies the UX acceptance bar (F026-F031) without destabilizing non-interactive commands.
  • (2026-03-25) Added Vim-style j/k/h/l bindings alongside arrows in the Ink shell. Reason: improves SSH/operator ergonomics and made headless TUI tests deterministic.
  • (2026-03-25) Appliance pod inspection belongs inside the same Ink operator rather than a separate CLI/tool. Reason: operators should stay in one surface for lifecycle, status, and debugging.
  • (2026-03-25) Workload scope should default to appliance-relevant namespaces only (msp, alga-system, flux-system). Reason: operators asked for appliance-focused visibility, not a generic cluster browser.
  • (2026-03-25) Pod logs should use a full-screen viewer with bounded scrollback and Escape-to-return behavior. Reason: this matches operator expectations better than a cramped split view and avoids unbounded memory growth.
  • (2026-03-25) Implemented F032 by adding Workloads as a first-class Ink action and dedicated main-pane route. Reason: keeps pod inspection in the same operator surface as lifecycle and status actions.
  • (2026-03-25) Implemented F033 with a default namespace allowlist (msp, alga-system, flux-system) inside the workload adapter. Reason: appliance operators need focused inventory, not cluster-wide noise.
  • (2026-03-25) Implemented F034 with a refreshable pod table showing pod, namespace, status, ready, restarts, and age. Reason: aligns the workload pane with PRD operator-at-a-glance requirements.
  • (2026-03-25) Implemented F035 using timed workload polling with selection preservation by stable pod key. Reason: refreshing state must not disrupt active operator focus.
  • (2026-03-25) Implemented F036 by adding a full-screen log view opened from the workload list and closed with Esc back to workloads. Reason: this mirrors required drill-down behavior without layout loss.
  • (2026-03-25) Implemented F037 by using chunked tail expansion plus fixed-cap line windows. Reason: kubectl logs is append-oriented, so chunked reload with a cap is the practical bounded-memory strategy.
  • (2026-03-25) Implemented F038 with follow-mode tied to bottom position and automatic pause when scrolling upward. Reason: operators need live tail only when intentionally at stream bottom.
  • (2026-03-25) Implemented F039 with keyboard controls for workloads/logs (j/k, arrows, Enter, Esc, page scroll). Reason: parity with SSH-friendly keyboard workflows.
  • (2026-03-25) Implemented F040 via new lib/workloads.mjs adapter that encapsulates kubectl get pods and kubectl logs calls behind normalized APIs. Reason: keeps raw command details out of TUI view logic.

Discoveries / Constraints

  • (2026-03-25) The repo already has operator-facing appliance scripts: historical removed bootstrap script, historical removed upgrade script, reset-appliance-data.sh, and collect-support-bundle.sh.
  • (2026-03-25) The current bootstrap and upgrade flows are already release-manifest driven under historical local release metadata (removed).
  • (2026-03-25) The operator problem is not missing capability; it is poor usability and path/command discoverability.
  • (2026-03-25) The existing shell scripts already own sensitive logic like Talos config generation, Flux install, release value rendering, and destructive reset semantics. Reimplementing them in v1 would create drift risk.
  • (2026-03-25) The existing developer CLI is Nushell-based and heavily focused on dev/build/test workflows, which makes it a poor default home for a customer-facing appliance operator surface.
  • (2026-03-25) ee/appliance had no existing app package or command framework, so the operator needed to bootstrap its own CLI/TUI modules and tests from scratch.
  • (2026-03-25) Bootstrap stderr/stdout can contain multiple layers in one run; classifier precedence must favor explicit Kubernetes timeout strings when Talos logs are also present.
  • (2026-03-25) The current operator core and non-interactive commands are still the right foundation; the main change is swapping the interactive shell, not rewriting lifecycle or status logic.
  • (2026-03-25) ink@5.x was incompatible with this repo's React 19 runtime (ReactCurrentOwner crash during module load). ink@6.8.0 resolves the compatibility issue.
  • (2026-03-25) The new UI keeps a persistent layout with dedicated header, action navigator, status dashboard panel, main content pane, progress panel, and contextual help strip.
  • (2026-03-25) kubectl logs is not a true random-access log API, so "scrollback pagination" must be approximated by chunked reloads and bounded windows rather than arbitrary seek.
  • (2026-03-25) Auto-refreshing workload state must preserve selection and avoid clobbering active log-view state during operator inspection.
  • (2026-03-25) Ink page-up/page-down availability depends on terminal input; tests are more deterministic using j/k and Enter paths.
  • (2026-03-25) kubectl logs --since-time can be used for live append polling once the latest seen timestamp is tracked in viewer state.

Commands / Runbooks

  • (2026-03-25) Plan scaffold command:
    • python3 /Users/roberisaacs/.codex/skills/alga-plan/scripts/scaffold_plan.py "Appliance Operator TUI"
  • (2026-03-25) Existing appliance script entrypoints:
    • historical removed bootstrap script
    • historical removed upgrade script
    • ee/appliance/scripts/reset-appliance-data.sh
    • ee/appliance/scripts/collect-support-bundle.sh
  • (2026-03-25) New operator entrypoints:
    • ee/appliance/appliance --help
    • ee/appliance/appliance tui
    • ee/appliance/appliance status
  • (2026-03-25) New test runbook:
    • node --test ee/appliance/operator/tests/*.test.mjs
  • (2026-03-25) Future workload/log implementation will likely need dedicated adapter tests separate from the existing lifecycle/status tests.
  • (2026-03-25) Workload/log implementation test run:
    • node --test ee/appliance/operator/tests/*.test.mjs
  • (2026-03-25) Ink dependency updates:
    • npm install ink@^6.8.0
    • npm install --save-dev ink-testing-library@^4.0.0
  • ee/appliance/README.md
  • historical removed bootstrap script
  • historical removed upgrade script
  • ee/appliance/appliance
  • ee/appliance/operator/appliance.mjs
  • ee/appliance/operator/lib/cli.mjs
  • ee/appliance/operator/lib/tui.mjs
  • ee/appliance/operator/lib/status.mjs
  • ee/appliance/operator/lib/lifecycle.mjs
  • ee/appliance/operator/tests/lifecycle-cli.test.mjs
  • ee/appliance/operator/tests/status.test.mjs
  • ee/appliance/operator/tests/runtime-paths.test.mjs
  • ee/appliance/operator/tests/tui-ink.test.mjs
  • ee/appliance/operator/lib/workloads.mjs
  • ee/appliance/operator/tests/workloads.test.mjs
  • ee/docs/premise/README.md
  • ee/docs/premise/talos-gitops-bootstrap.md
  • docs/plans/2026-03-10-talos-appliance-gitops-alga-deployment-design.md
  • docs/plans/2026-03-10-talos-image-factory-scaffolding-design.md

Open Questions

  • (Resolved 2026-03-25) TUI runtime/library: use Ink for the real interactive shell. The existing readline/promises shell is interim scaffolding, not the accepted end state.
  • (Resolved 2026-03-25) v1 status scope: summary-first (Talos/Kubernetes/Flux/Helm/workloads + release/config paths) without embedded log/event drill-down.
  • (Resolved 2026-03-25) Ship TUI and mirrored non-interactive command surface together in v1.
  • (Resolved 2026-03-25) Expanded v1 operator scope now includes appliance-relevant workload inventory and full-screen pod log viewing inside the Ink UI.
  • (Resolved 2026-03-25) T010 completed: workload console now validates appliance-only pod inventory, required columns, and selection preservation across refresh.
  • (Resolved 2026-03-25) T011 completed: selecting a pod opens full-screen logs and Esc restores workload layout/selection.
  • (Resolved 2026-03-25) T012 completed: log viewer validates chunked older-load behavior, follow/pause transitions, and bounded in-memory line caps.