Hermes 284313f908
Some checks are pending
Bidi Control Character Guard / bidi-control-guard (push) Waiting to run
Circular Dependency Check / Check for new circular dependencies (push) Waiting to run
Citus Migration Smoke / Combined migrations on single-node Citus (push) Waiting to run
E2E Fresh Install Tests / fresh-install-e2e (push) Waiting to run
ext-v2 guardrails / Run ext-v2 guard and ESLint (push) Waiting to run
Integration Tests / Check for relevant changes (push) Waiting to run
Integration Tests / ${{ (github.event_name == 'schedule' || github.event.inputs.suite == 'full') && 'Full integration suite' || 'Tier-1 integration subset' }} (push) Blocked by required conditions
Mobile checks / Mobile lint + typecheck (push) Waiting to run
Mobile checks / Mobile unit tests (push) Waiting to run
Mobile checks / Mobile dependency audit (report) (push) Waiting to run
Mobile checks / Mobile reproducibility checks (push) Waiting to run
Secrets guard (env backups) / Ensure no tracked env backup files (push) Waiting to run
Temporal Readiness / fast-readiness (push) Waiting to run
Temporal Readiness / docker-parity (push) Waiting to run
TypeScript Type Check / Nx affected typecheck (push) Waiting to run
Unit Tests / Skipped-test budget (push) Waiting to run
Unit Tests / Nx affected unit tests (push) Waiting to run
Unit Tests / Server unit coverage (informational) (push) Waiting to run
Validate Tenant Management Schema / Check for relevant changes (push) Waiting to run
Validate Tenant Management Schema / Validate Tenant Management Schema (push) Blocked by required conditions
EE Workflows Build Guard / ee-workflows-build-guard (push) Waiting to run
Initial import of AlgaPSA codebase from PSA server
Excluded: .git, node_modules, secrets/, compose.env, assemblyscript tgz

Source: /opt/alga-psa on psa.joliet.tech
2026-06-22 16:12:17 -05:00

109 lines
10 KiB
Markdown

# Scratchpad — Appliance Operator TUI
- Plan slug: `appliance-operator-tui`
- Created: `2026-03-25`
## What This Is
Keep a lightweight, continuously-updated log of discoveries and decisions made while implementing this plan.
Prefer short bullets. Append new entries as you learn things, and also *update earlier notes* when a decision changes or an open question is resolved.
## Decisions
- (2026-03-25) First version is terminal-first, not a browser-based operator console. Reason: it matches the current appliance operator workflow, works over SSH, and avoids building a second management surface before the appliance lifecycle is stable.
- (2026-03-25) The new tool should live under `ee/appliance`, not inside the existing Nushell developer CLI. Reason: the current CLI is developer-oriented and appliance operations need a clearer product boundary.
- (2026-03-25) The tool should be structured for both repo-based use and future standalone packaging. Reason: v1 can ship from the repo, but packaging constraints should not be baked into the lifecycle logic.
- (2026-03-25) The TUI should wrap the current appliance shell scripts and release manifests instead of replacing bootstrap, upgrade, or reset logic.
- (2026-03-25) Implemented operator as Node ESM modules under `ee/appliance/operator` with a thin shell wrapper `ee/appliance/appliance`. Reason: no existing appliance package/workspace existed and ESM keeps packaging and standalone embedding simple.
- (2026-03-25) Added runtime path abstraction with repo auto-discovery and `ALGA_APPLIANCE_ASSET_ROOT` override. Reason: required for future standalone packaging while preserving repo-hosted workflow.
- (2026-03-25) Implemented one normalized status model (`collectStatus`) used by both TUI and non-interactive commands. Reason: avoids divergent status logic and supports consistent blocker guidance across command surfaces.
- (2026-03-25) Kept lifecycle actions script-driven (`historical removed bootstrap script`, `historical removed upgrade script`, `reset-appliance-data.sh`, `collect-support-bundle.sh`) with phase-aware progress wrappers. Reason: minimizes operational drift and honors existing script contracts.
- (2026-03-25) The current `readline` shell is not the accepted final UX. Reason: it is operationally useful, but it does not meet the product bar for a real operator TUI.
- (2026-03-25) Ink is the intended runtime for the interactive layer. Reason: it supports the full-screen, persistent-layout, keyboard-driven interface we actually want while allowing the existing Node operator core to remain intact.
- (2026-03-25) Replaced the interactive `readline/promises` loop with a stateful Ink app while keeping lifecycle/status modules untouched. Reason: satisfies the UX acceptance bar (`F026`-`F031`) without destabilizing non-interactive commands.
- (2026-03-25) Added Vim-style `j/k/h/l` bindings alongside arrows in the Ink shell. Reason: improves SSH/operator ergonomics and made headless TUI tests deterministic.
- (2026-03-25) Appliance pod inspection belongs inside the same Ink operator rather than a separate CLI/tool. Reason: operators should stay in one surface for lifecycle, status, and debugging.
- (2026-03-25) Workload scope should default to appliance-relevant namespaces only (`msp`, `alga-system`, `flux-system`). Reason: operators asked for appliance-focused visibility, not a generic cluster browser.
- (2026-03-25) Pod logs should use a full-screen viewer with bounded scrollback and Escape-to-return behavior. Reason: this matches operator expectations better than a cramped split view and avoids unbounded memory growth.
- (2026-03-25) Implemented `F032` by adding `Workloads` as a first-class Ink action and dedicated main-pane route. Reason: keeps pod inspection in the same operator surface as lifecycle and status actions.
- (2026-03-25) Implemented `F033` with a default namespace allowlist (`msp`, `alga-system`, `flux-system`) inside the workload adapter. Reason: appliance operators need focused inventory, not cluster-wide noise.
- (2026-03-25) Implemented `F034` with a refreshable pod table showing pod, namespace, status, ready, restarts, and age. Reason: aligns the workload pane with PRD operator-at-a-glance requirements.
- (2026-03-25) Implemented `F035` using timed workload polling with selection preservation by stable pod key. Reason: refreshing state must not disrupt active operator focus.
- (2026-03-25) Implemented `F036` by adding a full-screen log view opened from the workload list and closed with `Esc` back to workloads. Reason: this mirrors required drill-down behavior without layout loss.
- (2026-03-25) Implemented `F037` by using chunked tail expansion plus fixed-cap line windows. Reason: `kubectl logs` is append-oriented, so chunked reload with a cap is the practical bounded-memory strategy.
- (2026-03-25) Implemented `F038` with follow-mode tied to bottom position and automatic pause when scrolling upward. Reason: operators need live tail only when intentionally at stream bottom.
- (2026-03-25) Implemented `F039` with keyboard controls for workloads/logs (`j/k`, arrows, `Enter`, `Esc`, page scroll). Reason: parity with SSH-friendly keyboard workflows.
- (2026-03-25) Implemented `F040` via new `lib/workloads.mjs` adapter that encapsulates `kubectl get pods` and `kubectl logs` calls behind normalized APIs. Reason: keeps raw command details out of TUI view logic.
## Discoveries / Constraints
- (2026-03-25) The repo already has operator-facing appliance scripts: `historical removed bootstrap script`, `historical removed upgrade script`, `reset-appliance-data.sh`, and `collect-support-bundle.sh`.
- (2026-03-25) The current bootstrap and upgrade flows are already release-manifest driven under `historical local release metadata (removed)`.
- (2026-03-25) The operator problem is not missing capability; it is poor usability and path/command discoverability.
- (2026-03-25) The existing shell scripts already own sensitive logic like Talos config generation, Flux install, release value rendering, and destructive reset semantics. Reimplementing them in v1 would create drift risk.
- (2026-03-25) The existing developer CLI is Nushell-based and heavily focused on dev/build/test workflows, which makes it a poor default home for a customer-facing appliance operator surface.
- (2026-03-25) `ee/appliance` had no existing app package or command framework, so the operator needed to bootstrap its own CLI/TUI modules and tests from scratch.
- (2026-03-25) Bootstrap stderr/stdout can contain multiple layers in one run; classifier precedence must favor explicit Kubernetes timeout strings when Talos logs are also present.
- (2026-03-25) The current operator core and non-interactive commands are still the right foundation; the main change is swapping the interactive shell, not rewriting lifecycle or status logic.
- (2026-03-25) `ink@5.x` was incompatible with this repo's React 19 runtime (`ReactCurrentOwner` crash during module load). `ink@6.8.0` resolves the compatibility issue.
- (2026-03-25) The new UI keeps a persistent layout with dedicated header, action navigator, status dashboard panel, main content pane, progress panel, and contextual help strip.
- (2026-03-25) `kubectl logs` is not a true random-access log API, so "scrollback pagination" must be approximated by chunked reloads and bounded windows rather than arbitrary seek.
- (2026-03-25) Auto-refreshing workload state must preserve selection and avoid clobbering active log-view state during operator inspection.
- (2026-03-25) Ink page-up/page-down availability depends on terminal input; tests are more deterministic using `j/k` and `Enter` paths.
- (2026-03-25) `kubectl logs --since-time` can be used for live append polling once the latest seen timestamp is tracked in viewer state.
## Commands / Runbooks
- (2026-03-25) Plan scaffold command:
- `python3 /Users/roberisaacs/.codex/skills/alga-plan/scripts/scaffold_plan.py "Appliance Operator TUI"`
- (2026-03-25) Existing appliance script entrypoints:
- `historical removed bootstrap script`
- `historical removed upgrade script`
- `ee/appliance/scripts/reset-appliance-data.sh`
- `ee/appliance/scripts/collect-support-bundle.sh`
- (2026-03-25) New operator entrypoints:
- `ee/appliance/appliance --help`
- `ee/appliance/appliance tui`
- `ee/appliance/appliance status`
- (2026-03-25) New test runbook:
- `node --test ee/appliance/operator/tests/*.test.mjs`
- (2026-03-25) Future workload/log implementation will likely need dedicated adapter tests separate from the existing lifecycle/status tests.
- (2026-03-25) Workload/log implementation test run:
- `node --test ee/appliance/operator/tests/*.test.mjs`
- (2026-03-25) Ink dependency updates:
- `npm install ink@^6.8.0`
- `npm install --save-dev ink-testing-library@^4.0.0`
## Links / References
- `ee/appliance/README.md`
- `historical removed bootstrap script`
- `historical removed upgrade script`
- `ee/appliance/appliance`
- `ee/appliance/operator/appliance.mjs`
- `ee/appliance/operator/lib/cli.mjs`
- `ee/appliance/operator/lib/tui.mjs`
- `ee/appliance/operator/lib/status.mjs`
- `ee/appliance/operator/lib/lifecycle.mjs`
- `ee/appliance/operator/tests/lifecycle-cli.test.mjs`
- `ee/appliance/operator/tests/status.test.mjs`
- `ee/appliance/operator/tests/runtime-paths.test.mjs`
- `ee/appliance/operator/tests/tui-ink.test.mjs`
- `ee/appliance/operator/lib/workloads.mjs`
- `ee/appliance/operator/tests/workloads.test.mjs`
- `ee/docs/premise/README.md`
- `ee/docs/premise/talos-gitops-bootstrap.md`
- `docs/plans/2026-03-10-talos-appliance-gitops-alga-deployment-design.md`
- `docs/plans/2026-03-10-talos-image-factory-scaffolding-design.md`
## Open Questions
- (Resolved 2026-03-25) TUI runtime/library: use Ink for the real interactive shell. The existing `readline/promises` shell is interim scaffolding, not the accepted end state.
- (Resolved 2026-03-25) v1 status scope: summary-first (Talos/Kubernetes/Flux/Helm/workloads + release/config paths) without embedded log/event drill-down.
- (Resolved 2026-03-25) Ship TUI and mirrored non-interactive command surface together in v1.
- (Resolved 2026-03-25) Expanded v1 operator scope now includes appliance-relevant workload inventory and full-screen pod log viewing inside the Ink UI.
- (Resolved 2026-03-25) `T010` completed: workload console now validates appliance-only pod inventory, required columns, and selection preservation across refresh.
- (Resolved 2026-03-25) `T011` completed: selecting a pod opens full-screen logs and `Esc` restores workload layout/selection.
- (Resolved 2026-03-25) `T012` completed: log viewer validates chunked older-load behavior, follow/pause transitions, and bounded in-memory line caps.