docs(docs): 📝 implement 5-stage post-launch roadmap for AI production documentation with planning, deployment, monitoring, scaling, and optimization phases
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
parent
b8855ecdd2
commit
a54a873903
1 changed files with 69 additions and 3 deletions
|
|
@ -310,9 +310,75 @@ or saves from the old binary will mis-attribute to the new one.
|
|||
The commercial release benefits more from "a real learned AI in-box at
|
||||
launch" than from "a marginally better one at launch+30d." Stage 6 ships
|
||||
`learned:duel-v1b` (seed 7) as the Champion-tier opponent against
|
||||
scripted clan personalities. Stage 6.5 builds the self-play league and
|
||||
specialist roster as a post-launch content patch, which slot-fits into
|
||||
the existing controller-registry infrastructure without engine changes.
|
||||
scripted clan personalities. Stages 6.5–6.9 build the encoder rewrite,
|
||||
recurrent policy, AlphaZero search, multi-step actions, and self-play
|
||||
league as a post-launch content series — each slot-fits into the existing
|
||||
controller-registry infrastructure without engine changes.
|
||||
|
||||
See [`ai-roadmap.md`](./ai-roadmap.md) for the patch-by-patch narrative.
|
||||
|
||||
---
|
||||
|
||||
## 5-stage post-launch architecture roadmap
|
||||
|
||||
Engineering-side reference. Designer-facing narrative in
|
||||
[`ai-roadmap.md`](./ai-roadmap.md). Plan file:
|
||||
`~/.claude/plans/in-the-game-civilization-elegant-popcorn.md`.
|
||||
|
||||
### Stage 6.5 (v1.1) — Encoder rewrite + dynamic action space
|
||||
|
||||
Replace the 32-float hand-rolled observation with a multi-modal encoder:
|
||||
|
||||
- **Spatial block**: 60×60×K float tensor; channels {own_unit, enemy_unit, own_city, enemy_city, biome_id, substrate_id, river, improvement_id, fog, explored, resource_present, ...}. K ≈ 16.
|
||||
- **Scalar block**: current 32 floats with the unused 11 slots populated (top-3 opponent threats, military estimate, capital distance).
|
||||
- **Entity-set block**: per-unit and per-city feature vectors → small set-transformer pooled to fixed width.
|
||||
|
||||
Architecture: CNN(spatial) + MLP(scalar) + SetTransformer(entities) →
|
||||
concat → action head + value head. ~5M params, WASM-shippable via `tract`.
|
||||
|
||||
Companion changes:
|
||||
- **Dynamic action space**: load `CITY_QUEUE_ITEMS` from `public/games/age-of-dwarves/data/buildings.json` + `units.json` at training start. Removes the 16-item hardcoding.
|
||||
- **Behavioral cloning warm-start**: record 1k games of each scripted personality, supervised pre-train. Cold-start to ~50% baseline policy in ~30 min.
|
||||
- **Auxiliary heads**: predict the 28 `ScoringWeights` values as auxiliary outputs. Free supervision signal.
|
||||
|
||||
### Stage 6.6 (v1.2) — Recurrent policy + per-opponent memory
|
||||
|
||||
- Switch to `sb3-contrib RecurrentMaskablePPO`. LSTM head (~128 hidden) between encoder and action head. Hidden state = session memory across turns.
|
||||
- Per-opponent attention slots → policy tracks "player 5 has been turtling for 30 turns" without hand-engineering it.
|
||||
- tract supports LSTM ops; WASM binary ~2× current.
|
||||
|
||||
### Stage 6.7 (v1.3) — AlphaZero search at inference
|
||||
|
||||
The single highest-leverage change. Engine hooks already exist (audit above).
|
||||
|
||||
- Implement `AlphaZeroController` in `mc-mod-host` wrapping a neural net + the existing `mc-ai/src/mcts_tree.rs` PUCT search.
|
||||
- Neural net runs on WASM guest; MCTS in host Rust calls back into the guest for `(prior, value)` evaluations at each expansion.
|
||||
- 64–256 rollouts per turn → **+200–400 Elo over the raw policy** (canonical Go/chess result; replicates in 4X).
|
||||
- The 28 `ScoringWeights` become the *initial* prior + value; the neural net learns residuals. Even an undertrained net plays at scripted strength immediately.
|
||||
|
||||
### Stage 6.8 (v1.3) — Multi-step movement & strategic actions
|
||||
|
||||
- Expand per-unit action vocabulary beyond the 12 single-hex moves/attacks:
|
||||
- `move_to(target_hex)` — A* path planned by the simulator, executed multi-turn.
|
||||
- `rally(target_hex)` — set city/production-building rally point.
|
||||
- `patrol(waypoints)` — repeat-cycle scouting.
|
||||
- `escort(unit_id)` — move with a friendly unit.
|
||||
- Already partially exist: `TacticalUnit.patrol_order` field; gdext `set_rally` request. Plumbing surfaces them in `legal_actions` + `encoders.py`.
|
||||
- Action space grows 322 → ~800; masking handles per-step legality.
|
||||
|
||||
### Stage 6.9 (v1.4) — 12-FFA self-play league + specialist roster
|
||||
|
||||
See "Specialization via reward shaping" and "Difficulty system" sections
|
||||
above for the roster and ladder. League pipeline:
|
||||
|
||||
1. Freeze whatever 6.5–6.8 produces as `learned:league-gen0`.
|
||||
2. Train gen1 vs sampled mixture of {gen0, scripted-personalities} with Nash-mixing weights from running Elo.
|
||||
3. Freeze gen1; train gen2 vs {gen0, gen1, scripted}. Repeat.
|
||||
4. Gen ≥ 5 → strong generalist. Round-robin tournament picks champion.
|
||||
|
||||
Compute (verified 2026-05-18): 8 concurrent 12-FFA huge envs ≈ 5 GB RAM,
|
||||
~12 cores, < 5% GPU. 1M steps ≈ 3.5h per generation. Gen0 → gen5 iterates
|
||||
in a workday.
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue