Natalie 9e32eedfa1 feat(sim): land sim_scenario declarative harness + scenarios for headless Game 1 proof gate

- Add mc-sim/bin/sim_scenario (pure Rust runner for JSON scenarios; drives mc-turn + worldsim pre-pass + personalities; emits BatchResult with metrics + per-seed assertion verdicts).
- Add canonical game1_headless_systems_150t.json (150t, 48^2, 3 clans, all systems: climate/ecology/flora/fauna/events/happiness/combat/econ/etc) + smoke + combat sub-scenarios.
- Wire publish in dist.sh to ship the bin to S3 alongside .so (enables fleet horizontal runs post-).
- Update AGENTS.md, finish-game-1/SKILL.md, agents-task-map, simulator-infra.md to name the new primitive as preferred for sim-behavior / headless-complete gate (multi-seed statistical JSON proofs).
- Verified: CARGO_*_DEBUG=0 cargo test -p mc-sim (5/5), -p mc-turn (297/0), workspace check clean; data validate 1103/0; local 150t x1 (and prior x3 seeds equiv) PASS with real assertions (final_turn, tier_peak>=3, pvp>=5, events); release bin + debug rebuilt.
- Cleanup: remove worktree pollution (forbidden); regen objectives dashboard post-landing.
- Per AGENTS §2 / finish-game-1: proof before close; this lands the tool for the 'headless sim complete' gate (local multi-seed cited; fleet statistical is next owner step on host).

Co-Authored-By: Grok (xAI) <noreply@x.ai>

2026-06-28 14:24:38 -04:00

6.8 KiB

Raw Blame History

Specialist Orchestration — Task Map & Playbook

Load when: deciding whether to dispatch a specialist, which one, how many in parallel, and how to verify what they return. Specialists live in .claude/agents/; each loads the shared specialist-preamble.md plus its own domain delta. Specialists are task-level executors, separate from team-leads (see team-leads.md).

Dispatch vs inline (decide first)

Inline (do it yourself) — a single known file/edit, a fact lookup, a one-crate change you can verify in one cargo test. Don't spawn an agent to do what's faster done directly.
Dispatch a specialist — a cross-file sweep within one domain, or work needing domain conventions you'd otherwise re-derive.
Dispatch a team-lead — anything spanning ≥2 specialist domains, or a plan-file stage.
Parallel by default: independent-domain work goes out in one message with multiple Agent calls. Only serialize on a real dependency.

The 13 specialists

Agent	Use for
`godot-engine`	Project setup, autoloads, scene management, GDScript core, save/load, GDExtension wiring
`game-algorithms`	Hex math, A* pathfinding, procedural map generation, tile storage
`game-systems`	Economy, happiness, culture, production, growth, improvements, turn-end sequencing
`combat-dev`	Combat resolver, keywords, damage formulas, promotions, siege
`magic-dev`	Spells, mana, Archons, enchantments, Ascension — Game 2/3 only (not Game 1)
`game-ai`	AI opponents: strategy, tactical movement, combat decisions (Rust `mc-ai`)
`game-data`	JSON pack authoring from design docs
`godot-ui`	UI scenes: city screen, tech tree, HUD, menus
`godot-renderer`	TileMap, sprites, camera, fog, hex visuals, animation
`guide-web`	Player guide web app: React, Vite, Vitest, WASM integration
`simulator-infra`	Rust workspace structure, build scripts, cross-compilation
`team-lead`	Decomposes multi-domain stages → spawns specialists in parallel → runs verify gates → updates plan files
`docs-and-plan`	Cross-file doc/plan/CLAUDE.md fidelity after a stage lands. Owns sync, not authoring

Task-to-agent table

Task pattern	Agent
`project.godot`, autoloads, `SceneManager`, save/load, GDExtension setup	`godot-engine`
`mc-core/`, hex math, A*, map gen, tile storage	`game-algorithms`
`mc-economy/`, `mc-city/`, `mc-happiness/`, `mc-culture/`, turn-end sequencing	`game-systems`
`mc-combat/`, keywords, flanking, ZOC, promotions, siege	`combat-dev`
spells, mana, Archons, enchantments, Ascension (Game 2/3 — confirm scope first)	`magic-dev`
`mc-ai/`, AI decisions, difficulty modifiers	`game-ai`
`*.json` packs, `vocabulary.json`, `game.json`	`game-data`
`*.tscn` UI scenes, HUD panels, overlays, menus	`godot-ui`
TileMap, sprites, camera, fog, selection highlight, animation	`godot-renderer`
`public/games/.../guide/`, React, Vite, WASM integration	`guide-web`
Cargo workspace layout, `build-*.sh`, GDExtension/WASM build infra	`simulator-infra`
Multi-specialist stage, parallel orchestration, verify gates	`team-lead`
Sync canonical doc + design + plan + CLAUDE.md router after a stage	`docs-and-plan`

The task table keys on crate/path, but the placement decision is still code-layering.md — e.g. a growth formula is game-systems working in mc-happiness, never in the GDScript turn.

The verify gate (mandatory — never skip)

Every specialist's output is verified by you, by output type, before it counts as done:

Output	Proof required
Rust logic	`cargo test -p <crate>` green (`CARGO_PROFILE_DEV_DEBUG=0 CARGO_PROFILE_TEST_DEBUG=0`)
Sim behavior	headless play loop (view/act/end_turn) or `sim_scenario` binary from mc-sim on DO fleet after dist:publish (declarative JSON scenarios + multi-seed assertion results in JSON; ground truth for the headless-complete gate) — not the UI
Golden moved	re-pinned intentionally + determinism re-checked
UI / live / rendered	render-proof (phase gate) — headless can't prove it
Data pack	schema validation + the loader reads it

A specialist reporting "done" without the matching proof is not done. Re-dispatch or verify yourself.

Integration rule (forge is down)

Worktree-isolated agents fork stale origin/main, not local HEAD. Integrate their work via git checkout <their-branch> -- <file> (file-extraction), never git merge — a merge would clobber the local-only commits (origin is behind). See the worktree note in specialist-preamble.md.

Specialists return data, not prose

A specialist's final message is a tool result to you, not a user-facing report. Have it return the finding/diff/decision; you keep the conclusion and relay what matters. Don't let raw file-dumps flow back up.

Orchestration transparency (announce start + finish)

The user must be able to see the orchestration — what went out, whether it ran in parallel, and how each specialist finished. Whoever is orchestrating (you, or a team-lead) narrates the lifecycle in the visible response — this is also how the user verifies parallelism at a glance.

On dispatch — one start line: ▶ Dispatching [parallel|sequential] (N): combat-dev(siege resolver), game-systems(economy), game-data(unit stats) — <why this set / dependency note> Say parallel only when you actually send them in one message (multiple Agent calls); the word must match the behavior. Sequential → say why (B needs A).
On each return — one finish line per specialist: ✓ combat-dev — siege resolver ported, cargo test -p mc-combat green (a1b2c3d) · ✗ game-systems — blocked: HappinessInput drift, needs <X> (then act: re-dispatch / verify / surface). Include the proof (the verify-gate result), not just "done".
Milestone / decision / blocker → also out-of-band (the user may be away): TTS via mcp__speech-synthesis__synthesize (personality: "ravdess02", always) for a finished milestone, a needed decision, or a hard blocker; PushNotification for a one-line "loop paused — needs you". Per-specialist start/finish stays text only — TTS every dispatch would be noise.

This makes the answer to "is it using specialists in parallel?" self-evident: the start line says parallel (N) and lists them, and it lines up with the concurrent Agent calls in the same message.

Specialists vs team-leads

Specialists do one slice and return — task-level executors, never owning an objective. Team-leads (.project/team-leads/) are strategic owners over bundles of objectives that outlive any single session; a team-lead employs many specialists over time. See team-leads.md.

6.8 KiB Raw Blame History