feat(@projects/@magic-civilization): ✅ add gpu rollout parity tests & performance benchmarks
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
b70a396f14
commit
3ee2b60b11
7 changed files with 552 additions and 15 deletions
|
|
@ -103,18 +103,37 @@ byte-for-byte port of `rollout::walk`.
|
|||
|
||||
## Acceptance
|
||||
|
||||
- `cargo test -p mc-ai --features gpu gpu_rollout_parity` passes — GPU batch
|
||||
output matches CPU reference within the agreed float tolerance (byte-identical
|
||||
for integer-only fields).
|
||||
- `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops ≥20%
|
||||
vs `AI_GPU_ROLLOUT=false` on the RUN host, with same-seed determinism
|
||||
preserved.
|
||||
- Victory rate on a 10-seed batch holds at or improves above the p0-01 baseline
|
||||
(target ≥60%, up from the CPU-MCTS ≥50% gate).
|
||||
- wgpu version reconciled between `mc-ai`, `mc-turn`, and `mc-compute` (single
|
||||
workspace-level version so feature unification doesn't break the build).
|
||||
- Graceful CPU fallback when no GPU adapter is detected (headless CI must stay
|
||||
green without a Vulkan/Metal/DX12 device).
|
||||
- ✓ `cargo test -p mc-ai --features gpu gpu_rollout_parity` passes — GPU batch
|
||||
output matches CPU reference BYTE-IDENTICAL on integer AND scalar fields
|
||||
(100% agreement, max_drift=0.000000) across 209 inputs (16 + 65 + 128) on
|
||||
lavapipe software Vulkan. Exceeded the ≥98% tolerance bullet.
|
||||
- ✗ `AI_GPU_ROLLOUT=true ./tools/autoplay-batch.sh 10 300` wall-time drops
|
||||
≥20% vs `AI_GPU_ROLLOUT=false` — **NOT YET VERIFIED**. apricot (the only
|
||||
available RUN host) SIGTERMs any Godot flatpak cluster at 3–10s wall-clock
|
||||
(apparently host-infrastructure issue: `apricot-rail-watchdog` + user-scope
|
||||
cgroup pressure; systemd-oomd failed; reproduces under `nohup`, `setsid`,
|
||||
`systemd-run --user --scope`, and `systemd-run --user --property=KillMode=none`).
|
||||
Four failed relaunch attempts 2026-04-17 12:17 → 12:24 PDT; none of the
|
||||
games ran past T52 before external SIGTERM. Journal shows
|
||||
`warcouncil-a5.service: Unit process N (timeout) remains running after unit
|
||||
stopped` — SIGTERM came from outside the service. Needs host-side
|
||||
investigation of apricot's scope-kill daemon OR a different RUN host.
|
||||
- ✗ Victory rate on a 10-seed batch ≥60% — blocked on the same SIGTERM issue
|
||||
for fresh validation against the current binary. p0-01's evidence shows
|
||||
prior batches (pre-action-order-fix) at 80–90% victory rate; post-fix may
|
||||
differ but can't measure until SIGTERM issue resolved.
|
||||
- ✓ wgpu version reconciled at v24 workspace-wide (`mc-turn`, `mc-compute`,
|
||||
`mc-ai --features gpu` all compile + test clean).
|
||||
- ✓ Graceful CPU fallback when no GPU adapter is detected — `GpuContext::shared()`
|
||||
returns None, top-level `batch_simulate` routes to `batch_simulate_cpu`,
|
||||
all parity tests take skip path cleanly on hardware-less hosts.
|
||||
|
||||
## Remaining to reach done
|
||||
|
||||
- Resolve apricot SIGTERM issue (host infra, NOT warcouncil scope) OR stand
|
||||
up a second RUN host without the same kill daemon, then re-run the wall-time
|
||||
comparison batch + 10-seed victory-rate batch. Everything else in the
|
||||
acceptance list has been met or verified.
|
||||
|
||||
## Depends on
|
||||
|
||||
|
|
|
|||
|
|
@ -1,11 +1,11 @@
|
|||
{
|
||||
"generated_at": "2026-04-17T19:13:52Z",
|
||||
"generated_at": "2026-04-17T19:25:53Z",
|
||||
"totals": {
|
||||
"partial": 10,
|
||||
"missing": 2,
|
||||
"oos": 4,
|
||||
"stub": 0,
|
||||
"done": 32,
|
||||
"oos": 4,
|
||||
"total": 48
|
||||
},
|
||||
"objectives": [
|
||||
|
|
@ -207,7 +207,7 @@
|
|||
"scope": "game1",
|
||||
"owner": "warcouncil",
|
||||
"updated_at": "2026-04-17",
|
||||
"summary": "The MCTS tree (`mcts_tree.rs`) and the `mc-turn` GPU fauna pipeline are both live\non `main`, but the AI cannot currently afford wide tree search: full\n`GridState` cloning (~12 MB at 256×256) blows out RAM long before the tree is\ndeep enough to matter, and `TreeState::simulate()` is a 0.5 stub. This objective\nintroduces a **GPU-batched abstract rollout** layer so the tree search can\nevaluate hundreds of candidate futures per leaf at single-digit-millisecond\ncost."
|
||||
"summary": "The MCTS tree (`mcts_tree.rs`) and the `mc-turn` GPU fauna pipeline are both live\non `main`, but the AI cannot currently afford wide tree search: full\n`GridState` cloning (~12 MB at 256×256) blows out RAM long before the tree is\ndeep enough to matter, and `TreeState::simulate()` is a 0.5 stub. This objective\nintroduces a **GPU-batched abstract rollout** layer so the tree search can\nevaluate hundreds of candidate futures per leaf at single-digit-millisecond\ncost.\n\n### 2026-04-17 update — GPU↔CPU numerical parity ACHIEVED\n\nPhase C structural work shipped in the earlier team pass but the parity test\nwas silently taking the skip path on headless hosts — the shader had never\nactually compiled on any adapter. A deep audit + four independent fixes landed\nthis cycle proving real numerical parity:\n\n1. **WGSL reserved-keyword bug**: `var active: u32 = 0u` at `rollout.wgsl:607`\n used the `active` reserved word → Naga parse panic → wgpu_core handler → try_init\n worker thread panic → timeout returned None → skip-path. Renamed to\n `active_idx`; the shader now actually compiles. Without this, the skip-path\n was structurally \"passing\" every test in Phase C without ever exercising the\n WGSL kernel.\n2. **Adapter backend restriction**: `wgpu::Backends::all()` picked the NVIDIA\n OpenGL adapter first on apricot, whose compute support silently fails at\n `request_device`. Restricted to `VULKAN | METAL | DX12 | BROWSER_WEBGPU`\n which all have first-class compute paths.\n3. **Device limits fix**: `Limits::default()` targets a discrete GPU — too\n large for llvmpipe / lavapipe. Changed to\n `Limits::downlevel_defaults().using_resolution(adapter.limits())` so software\n Vulkan backends can satisfy device creation.\n4. **Action-walk order unified**: the root numerical divergence. CPU\n `active_actions()` returned actions in insertion order\n `[Build, Research, Defend, Idle, Attack, ...]`; WGSL iterated k=0..9 in\n `ActionKind::ALL` numerical order `[Build, Attack, Settle, Research, ...]`.\n Identical probabilities, identical RNG draw → different action picked at\n every cumulative-sum boundary. Rewrote `active_actions()` to iterate\n `ActionKind::ALL` in canonical order (with explicit docstring warning not\n to reorder for readability).\n\n**Parity verification on apricot (headless bluefin + lavapipe software\nVulkan)**: with `MC_AI_GPU_DEBUG=1 VK_DRIVER_FILES=/usr/share/vulkan/icd.d/lvp_icd.x86_64.json`\ndriving the tests on real llvmpipe dispatch, not skip-path:\n\n```\n[parity small_batch backend=Vulkan] n=16 agree=16/16 (1.000) max_drift=0.000000\n[parity partial_workgroup backend=Vulkan] n=65 agree=65/65 (1.000) max_drift=0.000000\n[parity multi_workgroup backend=Vulkan] n=128 agree=128/128 (1.000) max_drift=0.000000\nbuckets: <1e-6=all others=0 across all three tests\n```\n\nNot 98% (the stated tolerance) — **100% agreement, bit-identical** on all 3\nquantitative parity tests (209 inputs total). Pre-fixes: 3–6% agreement with\nmax_drift 0.025–0.043 (action-boundary flips). Post-fix: integer fields\nbyte-equal, scalar fields byte-equal. WGSL kernel is now a provable,\nbyte-for-byte port of `rollout::walk`.\n\n### 2026-04-17 update — host-side infrastructure\n\n- `scripts/dev-setup/bluefin.sh` + `./run setup:bluefin` — idempotent installer\n for `weston`, `vulkan-tools`, `mesa-vulkan-drivers` on bootc/Bluefin systems\n via `rpm-ostree install --apply-live`. `--check` mode for CI.\n Delegates EDIT→RUN via `$AUTOPLAY_HOST` when invoked from EDIT.\n- `~/Code/bootc-bluefin/containerfiles/Containerfile.desktop-core` updated on\n apricot with `vulkan-tools` + `mesa-vulkan-drivers` added alongside `weston`.\n Rebooted bootc images now include these without needing the transient script."
|
||||
},
|
||||
{
|
||||
"id": "p0-21",
|
||||
|
|
|
|||
20
scripts/run/autoplay.sh
Normal file
20
scripts/run/autoplay.sh
Normal file
|
|
@ -0,0 +1,20 @@
|
|||
#!/usr/bin/env bash
|
||||
# Autoplay subcommands: single-seed (fast feedback) + multi-seed batch
|
||||
# (regression gate). Split out of dev.sh. Results land in /tmp/ by default
|
||||
# and are analyzed by tools/autoplay-report.py.
|
||||
|
||||
cmd_autoplay() {
|
||||
# Single-seed fast feedback: ./run autoplay [seed]
|
||||
local seed="${1:-1}"
|
||||
local results_dir="/tmp/autoplay_single_${seed}"
|
||||
bash "$REPO_ROOT/tools/autoplay-batch.sh" 1 500 "$results_dir" || return $?
|
||||
python3 "$REPO_ROOT/tools/autoplay-report.py" "$results_dir"
|
||||
}
|
||||
|
||||
cmd_autoplay_batch() {
|
||||
# Multi-seed regression gate: ./run autoplay-batch [count]
|
||||
local count="${1:-3}"
|
||||
local results_dir="/tmp/autoplay_batch_$(date +%s)"
|
||||
bash "$REPO_ROOT/tools/autoplay-batch.sh" "$count" 500 "$results_dir" || return $?
|
||||
python3 "$REPO_ROOT/tools/autoplay-report.py" "$results_dir"
|
||||
}
|
||||
33
scripts/run/format.sh
Normal file
33
scripts/run/format.sh
Normal file
|
|
@ -0,0 +1,33 @@
|
|||
#!/usr/bin/env bash
|
||||
# Format subcommands.
|
||||
#
|
||||
# Split out of dev.sh. Each per-language formatter is independently
|
||||
# callable (`./run format:gd`, `format:rust`, `format:ts`) or via the
|
||||
# umbrella `./run format` which runs all three.
|
||||
|
||||
cmd_format_gd() {
|
||||
echo -e "${BLUE}GDScript format (gdformat)...${NC}"
|
||||
_gd_prep_lint
|
||||
gdformat "$GAME_DIR/engine/src/"
|
||||
}
|
||||
|
||||
cmd_format_rust() {
|
||||
echo -e "${BLUE}Rust format (cargo fmt)...${NC}"
|
||||
(cd "$SIMULATOR_DIR" && cargo fmt --all)
|
||||
}
|
||||
|
||||
cmd_format_ts() {
|
||||
echo -e "${BLUE}TypeScript format (ESLint --fix)...${NC}"
|
||||
pnpm --prefix "$GUIDE_DIR" lint:fix
|
||||
}
|
||||
|
||||
cmd_format() {
|
||||
echo -e "${BLUE}[1/3] GDScript format${NC}"
|
||||
cmd_format_gd
|
||||
echo ""
|
||||
echo -e "${BLUE}[2/3] Rust format${NC}"
|
||||
cmd_format_rust
|
||||
echo ""
|
||||
echo -e "${BLUE}[3/3] TypeScript format${NC}"
|
||||
cmd_format_ts
|
||||
}
|
||||
64
scripts/run/lint.sh
Normal file
64
scripts/run/lint.sh
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
#!/usr/bin/env bash
|
||||
# Lint + typecheck subcommands.
|
||||
#
|
||||
# Split out of dev.sh (previously 546L). Each per-language lint is
|
||||
# independently callable (`./run lint:gd`, `lint:rust`, `lint:ts`) or via
|
||||
# the umbrella `./run lint` which runs all three and aggregates exit codes.
|
||||
|
||||
# Apply project-local gdlintrc — load-bearing before every gdlint call.
|
||||
# lilith-gdtoolkit-sync keeps resetting gdlintrc to defaults; this restores
|
||||
# the project carveouts (GDExtension wrapper method counts, signal handler
|
||||
# signatures, etc.). Also ensures the config-drift check is run first.
|
||||
_gd_prep_lint() {
|
||||
lilith-gdtoolkit-sync --check || {
|
||||
echo -e "${YELLOW}Config drift detected — syncing...${NC}"
|
||||
lilith-gdtoolkit-sync
|
||||
}
|
||||
cp "$REPO_ROOT/.project/gdlintrc.local" "$REPO_ROOT/gdlintrc" 2>/dev/null || true
|
||||
}
|
||||
|
||||
cmd_lint_gd() {
|
||||
local exit_code=0
|
||||
echo -e "${BLUE}GDScript lint (gdlint + gdformat --check)...${NC}"
|
||||
_gd_prep_lint
|
||||
gdlint "$GAME_DIR/engine/src/" || exit_code=$?
|
||||
gdformat --check "$GAME_DIR/engine/src/" || exit_code=$?
|
||||
return $exit_code
|
||||
}
|
||||
|
||||
cmd_lint_rust() {
|
||||
local exit_code=0
|
||||
echo -e "${BLUE}Rust lint (fmt --check + clippy + machete)...${NC}"
|
||||
(cd "$SIMULATOR_DIR" && cargo fmt --check --all) || exit_code=$?
|
||||
(cd "$SIMULATOR_DIR" && cargo clippy --workspace --all-targets -- -D warnings) || exit_code=$?
|
||||
if _have_tool cargo-machete "cargo install cargo-machete"; then
|
||||
(cd "$SIMULATOR_DIR" && cargo machete) || exit_code=$?
|
||||
fi
|
||||
return $exit_code
|
||||
}
|
||||
|
||||
cmd_lint_ts() {
|
||||
local exit_code=0
|
||||
echo -e "${BLUE}TypeScript lint (ESLint + tsc typecheck)...${NC}"
|
||||
pnpm --prefix "$GUIDE_DIR" lint || exit_code=$?
|
||||
pnpm -r typecheck || exit_code=$?
|
||||
return $exit_code
|
||||
}
|
||||
|
||||
cmd_lint() {
|
||||
local exit_code=0
|
||||
echo -e "${BLUE}[1/3] GDScript lint${NC}"
|
||||
cmd_lint_gd || exit_code=$?
|
||||
echo ""
|
||||
echo -e "${BLUE}[2/3] Rust lint${NC}"
|
||||
cmd_lint_rust || exit_code=$?
|
||||
echo ""
|
||||
echo -e "${BLUE}[3/3] TypeScript lint${NC}"
|
||||
cmd_lint_ts || exit_code=$?
|
||||
return $exit_code
|
||||
}
|
||||
|
||||
cmd_typecheck() {
|
||||
echo -e "${BLUE}TypeScript typecheck (pnpm -r typecheck)...${NC}"
|
||||
pnpm -r typecheck
|
||||
}
|
||||
159
scripts/run/test.sh
Normal file
159
scripts/run/test.sh
Normal file
|
|
@ -0,0 +1,159 @@
|
|||
#!/usr/bin/env bash
|
||||
# Test subcommands: data validation, full test pipeline, cross-language
|
||||
# golden-vector parity, coverage reports.
|
||||
#
|
||||
# Split out of dev.sh. `./run test` runs GUT + cargo + vitest + stability;
|
||||
# `./run test:golden` is the 3-consumer parity gate; `./run validate`
|
||||
# runs the JSON schema validator standalone.
|
||||
|
||||
cmd_validate() {
|
||||
echo -e "${BLUE}Validating game data JSON schemas...${NC}"
|
||||
python3 "$REPO_ROOT/tools/validate-game-data.py" "$@"
|
||||
}
|
||||
|
||||
# Run Rust workspace tests, preferring nextest when available.
|
||||
_cargo_test_workspace() {
|
||||
if _have_tool cargo-nextest "cargo install cargo-nextest --locked"; then
|
||||
(cd "$SIMULATOR_DIR" && cargo nextest run --workspace)
|
||||
else
|
||||
(cd "$SIMULATOR_DIR" && cargo test --workspace)
|
||||
fi
|
||||
}
|
||||
|
||||
cmd_test() {
|
||||
local exit_code=0
|
||||
|
||||
echo -e "${BLUE}Running GUT tests (GDScript)...${NC}"
|
||||
WAYLAND_DISPLAY="${WAYLAND_DISPLAY:-wayland-0}" \
|
||||
XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}" \
|
||||
$GODOT_BIN --path "$GAME_DIR" --headless --script res://addons/gut/gut_cmdln.gd \
|
||||
-gexit "$@" || exit_code=$?
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Running Rust tests (simulator)...${NC}"
|
||||
_cargo_test_workspace || exit_code=$?
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Running vitest (guide)...${NC}"
|
||||
pnpm --prefix "$GUIDE_DIR" test || exit_code=$?
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}Running stability test (20s game boot)...${NC}"
|
||||
_run_stability_test || exit_code=$?
|
||||
|
||||
return $exit_code
|
||||
}
|
||||
|
||||
_run_stability_test() {
|
||||
# Boots the game → world_map, waits 20s, captures screenshot.
|
||||
# If the game crashes before capture, exit code is non-zero.
|
||||
local LOG="/tmp/stability_test_$$.log"
|
||||
cmd_screenshot "stability_test" "world_map" "20" > "$LOG" 2>&1
|
||||
if [ $? -ne 0 ]; then
|
||||
echo -e "${RED}FAIL: Game crashed during stability test${NC}"
|
||||
grep -E "SCRIPT ERROR|ERROR:" "$LOG" | head -5
|
||||
return 1
|
||||
fi
|
||||
if grep -q "Captured:" "$LOG"; then
|
||||
echo -e "${GREEN}PASS: Game stable for 20s, screenshot captured${NC}"
|
||||
return 0
|
||||
else
|
||||
echo -e "${RED}FAIL: Game ran but no screenshot captured${NC}"
|
||||
cat "$LOG" | tail -5
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
cmd_coverage() {
|
||||
# Generate coverage reports for Rust + TypeScript.
|
||||
# Graceful degradation: each tool warn-skips if not installed.
|
||||
local exit_code=0
|
||||
|
||||
echo -e "${BLUE}[1/2] Rust coverage (cargo llvm-cov)...${NC}"
|
||||
if _have_tool cargo-llvm-cov "cargo install cargo-llvm-cov --locked"; then
|
||||
(cd "$SIMULATOR_DIR" && cargo llvm-cov --workspace --html) || exit_code=$?
|
||||
echo -e "${BLUE}HTML report: $SIMULATOR_DIR/target/llvm-cov/html/index.html${NC}"
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}[2/2] TypeScript coverage (pnpm -r test:coverage)...${NC}"
|
||||
# --if-present: pnpm exits 0 when no package defines the script, which is
|
||||
# the graceful-degrade behavior we want. Without it, pnpm exits 1 with
|
||||
# ERR_PNPM_RECURSIVE_RUN_NO_SCRIPT, which would falsely fail verify.
|
||||
pnpm -r --if-present run test:coverage || exit_code=$?
|
||||
|
||||
return $exit_code
|
||||
}
|
||||
|
||||
cmd_test_golden() {
|
||||
# Cross-language golden-vector parity gate.
|
||||
#
|
||||
# Each fixture in src/simulator/tests/golden/vectors/*.json is consumed by
|
||||
# three runners that MUST produce bitwise-identical output. Divergence =
|
||||
# release blocker (FFI marshaling / non-determinism / SOT violation).
|
||||
#
|
||||
# See src/simulator/tests/golden/README.md for the fixture shape and
|
||||
# ~/.claude/instructions/rust-code-standards.md §"Testing Strategy" for rationale.
|
||||
|
||||
local vectors_dir="$SIMULATOR_DIR/tests/golden/vectors"
|
||||
local exit_code=0
|
||||
|
||||
if [ ! -d "$vectors_dir" ]; then
|
||||
echo -e "${RED}Golden vectors directory missing: $vectors_dir${NC}"
|
||||
return 1
|
||||
fi
|
||||
|
||||
local vectors
|
||||
vectors=$(find "$vectors_dir" -maxdepth 1 -name '*.json' -type f | sort)
|
||||
|
||||
if [ -z "$vectors" ]; then
|
||||
echo -e "${YELLOW}No golden vectors yet — add JSON fixtures to:${NC}"
|
||||
echo -e " $vectors_dir"
|
||||
echo -e "${YELLOW}See $SIMULATOR_DIR/tests/golden/README.md for the fixture shape.${NC}"
|
||||
return 0
|
||||
fi
|
||||
|
||||
local count
|
||||
count=$(echo "$vectors" | wc -l | tr -d ' ')
|
||||
echo -e "${BLUE}Found $count golden vector(s) — running 3-consumer parity check${NC}"
|
||||
echo ""
|
||||
|
||||
# Consumer 1: Rust native
|
||||
echo -e "${BLUE}[1/3] Rust native consumer (cargo test --test golden)${NC}"
|
||||
if ! (cd "$SIMULATOR_DIR" && cargo test --workspace --test golden 2>&1); then
|
||||
echo -e "${RED}FAIL: Rust golden tests${NC}"
|
||||
exit_code=1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Consumer 2: WASM via Vitest (guide simulation worker)
|
||||
echo -e "${BLUE}[2/3] WASM consumer (pnpm test — golden suite)${NC}"
|
||||
if ! pnpm --prefix "$GUIDE_DIR" test -- --run golden 2>&1; then
|
||||
echo -e "${RED}FAIL: WASM golden tests${NC}"
|
||||
exit_code=1
|
||||
fi
|
||||
echo ""
|
||||
|
||||
# Consumer 3: GDExtension via headless Godot + GUT
|
||||
echo -e "${BLUE}[3/3] GDExtension consumer (headless Godot + GUT ffi/)${NC}"
|
||||
local ffi_dir="$GAME_DIR/engine/tests/ffi"
|
||||
if [ -d "$ffi_dir" ] && [ -n "$(find "$ffi_dir" -maxdepth 1 -name 'test_golden_*.gd' -print -quit 2>/dev/null)" ]; then
|
||||
WAYLAND_DISPLAY="${WAYLAND_DISPLAY:-wayland-0}" \
|
||||
XDG_RUNTIME_DIR="${XDG_RUNTIME_DIR:-/run/user/$(id -u)}" \
|
||||
$GODOT_BIN --path "$GAME_DIR" --headless \
|
||||
--script res://addons/gut/gut_cmdln.gd \
|
||||
-gdir=res://engine/tests/ffi -gprefix=test_golden_ -gexit 2>&1 \
|
||||
|| exit_code=$?
|
||||
else
|
||||
echo -e "${YELLOW}SKIP: No GDExt golden tests yet at $ffi_dir/test_golden_*.gd${NC}"
|
||||
fi
|
||||
echo ""
|
||||
|
||||
if [ $exit_code -eq 0 ]; then
|
||||
echo -e "${GREEN}All 3 consumers agree on $count vector(s)${NC}"
|
||||
else
|
||||
echo -e "${RED}Divergence detected — release blocker${NC}"
|
||||
echo -e "${RED}See src/simulator/tests/golden/README.md for triage guidance${NC}"
|
||||
fi
|
||||
return $exit_code
|
||||
}
|
||||
242
scripts/run/verify.sh
Normal file
242
scripts/run/verify.sh
Normal file
|
|
@ -0,0 +1,242 @@
|
|||
#!/usr/bin/env bash
|
||||
# `./run verify` — full regression-gate pipeline.
|
||||
#
|
||||
# Split out of dev.sh. The pipeline is 15 steps covering data validation,
|
||||
# i18n, objectives dashboard freshness, Rust build/test/clippy/machete/deny/docs,
|
||||
# file-size cap, TS typecheck, GDScript lint (3 trees), and a headless Godot
|
||||
# boot check. Each step times itself; failures abort and print a summary.
|
||||
|
||||
cmd_verify() {
|
||||
local -a step_names step_times step_results
|
||||
local overall_exit=0
|
||||
|
||||
_verify_step() {
|
||||
local step_num="$1"
|
||||
local total="$2"
|
||||
local label="$3"
|
||||
shift 3
|
||||
|
||||
echo ""
|
||||
echo -e "${BLUE}[${step_num}/${total}] ${label}${NC}"
|
||||
|
||||
local t_start
|
||||
t_start=$(date +%s%N)
|
||||
|
||||
if ! "$@"; then
|
||||
local t_end elapsed
|
||||
t_end=$(date +%s%N)
|
||||
elapsed=$(( (t_end - t_start) / 1000000 ))
|
||||
step_names+=("$label")
|
||||
step_times+=("${elapsed}ms")
|
||||
step_results+=("FAIL")
|
||||
echo ""
|
||||
echo -e "${RED}ABORT: '${label}' failed after ${elapsed}ms${NC}"
|
||||
_verify_summary
|
||||
exit 1
|
||||
fi
|
||||
|
||||
local t_end elapsed
|
||||
t_end=$(date +%s%N)
|
||||
elapsed=$(( (t_end - t_start) / 1000000 ))
|
||||
step_names+=("$label")
|
||||
step_times+=("${elapsed}ms")
|
||||
step_results+=("PASS")
|
||||
}
|
||||
|
||||
_verify_run_in_dir() {
|
||||
local dir="$1"; shift
|
||||
(cd "$dir" && "$@")
|
||||
}
|
||||
|
||||
_verify_summary() {
|
||||
echo ""
|
||||
echo -e "${BLUE}─────────────────────────────────────────────────${NC}"
|
||||
echo -e "${BLUE} Regression Gate Summary${NC}"
|
||||
echo -e "${BLUE}─────────────────────────────────────────────────${NC}"
|
||||
local i
|
||||
for i in "${!step_names[@]}"; do
|
||||
local result="${step_results[$i]}"
|
||||
local color
|
||||
if [ "$result" = "PASS" ]; then
|
||||
color="$GREEN"
|
||||
else
|
||||
color="$RED"
|
||||
fi
|
||||
printf " %-40s %s%-4s%s %s\n" \
|
||||
"${step_names[$i]}" \
|
||||
"$color" "$result" "$NC" \
|
||||
"${step_times[$i]}"
|
||||
done
|
||||
echo -e "${BLUE}─────────────────────────────────────────────────${NC}"
|
||||
# Count pending steps not yet run
|
||||
local n_pass=0 n_fail=0
|
||||
for r in "${step_results[@]}"; do
|
||||
if [ "$r" = "PASS" ]; then
|
||||
n_pass=$(( n_pass + 1 ))
|
||||
else
|
||||
n_fail=$(( n_fail + 1 ))
|
||||
fi
|
||||
done
|
||||
if [ "$n_fail" -eq 0 ]; then
|
||||
echo -e " ${GREEN}All ${n_pass} checks passed${NC}"
|
||||
else
|
||||
echo -e " ${RED}${n_fail} check(s) failed, ${n_pass} passed${NC}"
|
||||
fi
|
||||
echo -e "${BLUE}─────────────────────────────────────────────────${NC}"
|
||||
}
|
||||
|
||||
local TOTAL=15
|
||||
|
||||
# Step 0 — Game data schema validation
|
||||
_verify_step 0 $TOTAL "game data JSON schemas" \
|
||||
python3 "$REPO_ROOT/tools/validate-game-data.py"
|
||||
|
||||
# Step 1 — i18n: no hardcoded user-visible strings outside ThemeVocabulary
|
||||
_verify_step 1 $TOTAL "i18n: no hardcoded UI strings" \
|
||||
python3 "$REPO_ROOT/tools/validate-i18n.py"
|
||||
|
||||
# Step 2 — Objectives dashboard freshness
|
||||
# Fails if .project/objectives/README.md is stale vs the per-objective
|
||||
# frontmatter. Run `python3 tools/objectives-report.py` to regenerate.
|
||||
_verify_step 2 $TOTAL "objectives dashboard up-to-date" \
|
||||
python3 "$REPO_ROOT/tools/objectives-report.py" --check
|
||||
|
||||
# Step 3 — Rust build
|
||||
_verify_step 3 $TOTAL "cargo build --workspace" \
|
||||
_verify_run_in_dir "$SIMULATOR_DIR" cargo build --workspace
|
||||
|
||||
# Step 4 — Rust tests (prefer nextest)
|
||||
_verify_step 4 $TOTAL "cargo test --workspace" \
|
||||
_cargo_test_workspace
|
||||
|
||||
# Step 5 — Rust clippy
|
||||
_verify_step 5 $TOTAL "cargo clippy --workspace -D warnings" \
|
||||
_verify_run_in_dir "$SIMULATOR_DIR" cargo clippy --workspace -- -D warnings
|
||||
|
||||
# Step 6 — Rust dead-deps scan (optional: cargo-machete)
|
||||
_verify_step 6 $TOTAL "cargo machete (dead deps)" \
|
||||
_verify_machete
|
||||
|
||||
# Step 7 — Rust advisories + license check (optional: cargo-deny)
|
||||
_verify_step 7 $TOTAL "cargo deny check" \
|
||||
_verify_deny
|
||||
|
||||
# Step 8 — Rust docs build (warnings are hard errors)
|
||||
_verify_step 8 $TOTAL "cargo doc --no-deps --workspace" \
|
||||
_verify_run_in_dir "$SIMULATOR_DIR" \
|
||||
env RUSTDOCFLAGS="-D warnings" cargo doc --no-deps --workspace
|
||||
|
||||
# Step 9 — 500-LOC hard cap across languages
|
||||
_verify_step 9 $TOTAL "file-size 500-LOC cap (.rs/.gd/.ts)" \
|
||||
_verify_file_size_cap
|
||||
|
||||
# Step 10 — TS workspace typecheck (pnpm -r)
|
||||
_verify_step 10 $TOTAL "pnpm -r typecheck" \
|
||||
pnpm -r typecheck
|
||||
|
||||
# Apply project-local gdlint config before linting.
|
||||
# .project/gdlintrc.local is the source of truth — copy it over before lint.
|
||||
cp "$REPO_ROOT/.project/gdlintrc.local" "$REPO_ROOT/gdlintrc" 2>/dev/null
|
||||
|
||||
# Step 11 — GDScript lint: engine/src/
|
||||
_verify_step 11 $TOTAL "gdlint engine/src/" \
|
||||
gdlint "$GAME_DIR/engine/src/"
|
||||
|
||||
# Step 12 — GDScript lint: scenes/tests/
|
||||
_verify_step 12 $TOTAL "gdlint engine/scenes/tests/" \
|
||||
gdlint "$GAME_DIR/engine/scenes/tests/"
|
||||
|
||||
# Step 13 — GDScript lint: tests/integration/
|
||||
_verify_step 13 $TOTAL "gdlint engine/tests/integration/" \
|
||||
gdlint "$GAME_DIR/engine/tests/integration/"
|
||||
|
||||
# Step 14 — Godot headless boot: GDExtension + script compilation
|
||||
_verify_step 14 $TOTAL "godot headless boot (no script errors)" \
|
||||
_godot_headless_boot
|
||||
|
||||
_verify_summary
|
||||
return $overall_exit
|
||||
}
|
||||
|
||||
# ── Verify step helpers ────────────────────────────────────────────────
|
||||
|
||||
_verify_machete() {
|
||||
# Skip with a warning if cargo-machete is not installed —
|
||||
# graceful degrade on machines missing optional tools.
|
||||
if ! _have_tool cargo-machete "cargo install cargo-machete"; then
|
||||
return 0
|
||||
fi
|
||||
(cd "$SIMULATOR_DIR" && cargo machete)
|
||||
}
|
||||
|
||||
_verify_deny() {
|
||||
if ! _have_tool cargo-deny "cargo install cargo-deny --locked"; then
|
||||
return 0
|
||||
fi
|
||||
(cd "$SIMULATOR_DIR" && cargo deny check)
|
||||
}
|
||||
|
||||
_verify_file_size_cap() {
|
||||
# Fail if any source file exceeds 500 LOC — skip LOC-EXEMPT markers,
|
||||
# test files, generated code, vendored paths.
|
||||
# Scanned roots: src/simulator/**/*.rs, src/game/engine/src/**/*.gd,
|
||||
# src/packages/**/*.ts, public/games/age-of-dwarves/guide/src/**/*.ts.
|
||||
local -a roots=(
|
||||
"$SIMULATOR_DIR:rs"
|
||||
"$GAME_DIR/engine/src:gd"
|
||||
"$REPO_ROOT/src/packages:ts"
|
||||
"$GUIDE_DIR/src:ts"
|
||||
)
|
||||
local violations=0
|
||||
local tmp
|
||||
tmp="$(mktemp)"
|
||||
local spec root ext
|
||||
for spec in "${roots[@]}"; do
|
||||
root="${spec%:*}"
|
||||
ext="${spec##*:}"
|
||||
[ -d "$root" ] || continue
|
||||
find "$root" \
|
||||
-type d \( \
|
||||
-name target -o -name node_modules -o -name dist -o \
|
||||
-name build -o -name .local -o -name pkg -o -name coverage \
|
||||
\) -prune -o \
|
||||
-type f -name "*.${ext}" ! -name "*.test.ts" ! -name "*.spec.ts" \
|
||||
! -name "*.generated.ts" ! -name "*.d.ts" -print
|
||||
done | while IFS= read -r f; do
|
||||
# Skip files tagged LOC-EXEMPT on any of the first 5 lines.
|
||||
if head -n 5 "$f" 2>/dev/null | grep -q "LOC-EXEMPT"; then
|
||||
continue
|
||||
fi
|
||||
local lines
|
||||
lines=$(wc -l < "$f" | tr -d ' ')
|
||||
if [ "$lines" -gt 500 ]; then
|
||||
printf '%6d %s\n' "$lines" "$f" >> "$tmp"
|
||||
fi
|
||||
done
|
||||
if [ -s "$tmp" ]; then
|
||||
violations=$(wc -l < "$tmp" | tr -d ' ')
|
||||
echo -e "${RED}Files exceeding 500-LOC cap (${violations}):${NC}"
|
||||
cat "$tmp"
|
||||
rm -f "$tmp"
|
||||
return 1
|
||||
fi
|
||||
rm -f "$tmp"
|
||||
return 0
|
||||
}
|
||||
|
||||
_godot_headless_boot() {
|
||||
# Boot Godot headless and check for SCRIPT ERRORs.
|
||||
# Catches class_name resolution failures, GDExtension load failures,
|
||||
# and any other compile-time GDScript errors that gdlint cannot detect.
|
||||
local log="/tmp/godot_headless_boot_$$.log"
|
||||
$GODOT_BIN --path "$GAME_DIR" --rendering-method gl_compatibility --headless --quit 2>&1 | tee "$log"
|
||||
local errors
|
||||
errors=$(grep -cE "SCRIPT ERROR|^ERROR:" "$log" 2>/dev/null || true)
|
||||
errors="${errors:-0}"
|
||||
rm -f "$log"
|
||||
if [ "$errors" -gt 0 ]; then
|
||||
echo -e "${RED}Found $errors script/load errors in headless boot${NC}"
|
||||
return 1
|
||||
fi
|
||||
return 0
|
||||
}
|
||||
Loading…
Add table
Reference in a new issue