# Build memory measurement harness — design **Date:** 2026-06-04 **Branch:** `improve/build-memory-consumption` **Goal:** A repeatable tool that runs `npm run build`, verifies the build works, and measures **peak memory consumption** of the whole build — so we can drive a build-memory optimization loop with before/after numbers. ## Background / what the build is `npm run build` (from repo root) is a three-stage chain: 1. `build:assemblyscript` — `node scripts/build-assemblyscript-if-needed.mjs` 2. `npx nx build-deps server` — builds shared/dependent workspace packages 3. `cd server && next build --turbo` — the heavy stage (`NODE_OPTIONS=--max-old-space-size=8192`, Next.js 16, community edition) The build is a **process tree** (npm → nx → next → worker processes), so a meaningful peak must cover the whole tree, not a single process. ## Key findings that shaped the design - **`node` on this host is a snap** (`/snap/bin/node` → `snap run`). snap **relocates every node process into its own `snap.node.node-*.scope` cgroup**, escaping any `systemd-run --user --scope` wrapper. So the clean "wrap the build in one scope, read its `memory.peak`" approach does **not** work on the host — the build's node processes scatter across snap-managed cgroups. - Running the snap-internal node ELF directly (`/snap/node/current/bin/node`) avoids relocation but does **not** run node correctly (needs snap's runtime env). - **Docker fixes this cleanly.** Inside a container, `node` is a normal ELF (no snap), and the entire container runs in **one cgroup** that exposes `memory.peak` on cgroup v2. Verified on this box: a 300 MB allocation in a container registered as `memory.peak` ≈ 313 MB even after `memory.current` fell back — i.e. `memory.peak` captures the true whole-tree high-water mark with no sampling. This is also the *representative* number: CI builds images in containers, so the container peak is what OOMs under a memory limit. - **Host node is v24; project pins node 20 for runtime.** The host `node_modules` (≈3.8 GB) has native addons built for node 24's ABI, so they will not load under node 20. Decision: use a **`node:24-bookworm`** container and **reuse the host `node_modules` as-is** (zero install, fast loop). This reproduces the *host* build exactly, isolated in a container for clean cgroup measurement. Verified the host `node_modules` load in `node:24-bookworm` (container glibc 2.36 < host 2.43, but the prebuilt addons target old glibc): `next/dist/build/swc` requires OK, `next --version` → 16.2.6, `esbuild` works. (CI uses node:20; absolute numbers may differ slightly from CI — acceptable for a relative before/after optimization loop.) ## Architecture Two files, siblings of the existing `scripts/build-perf-harness.mjs`: ### `scripts/build-mem.sh` — host wrapper (bash) Host `node` is snap, so the wrapper is bash and only shells out to docker: ``` docker run --rm -v :/work -w /work [--memory ] \ node scripts/build-mem-harness.mjs ``` - Default image `node:24-bookworm`; `--image` to override. - `--memory` (optional) passes through to docker to test a memory ceiling (e.g. `--memory 8g` → "does the build fit in 8 GB?"). Unset = all host RAM. - All other flags pass through to the harness. - cgroupns is docker's default (private), so the container's `/sys/fs/cgroup` is its own cgroup root and `memory.peak` is the whole-container high-water mark. ### `scripts/build-mem-harness.mjs` — runs inside the container 1. **Clear** (default; `--skip-clear`): remove `server/.next` and `server/tsconfig.tsbuildinfo` for a representative cold build. 2. **Build**: spawn `bash -lc ''` (default `npm run build`) from `/work`, tee stdout/stderr to `.build-mem/build-