fix(@projects/@magic-civilization): 🐛 resolve autoplay architecture mismatch for learned controller integration

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
This commit is contained in:
Natalie 2026-06-08 12:08:45 -07:00
parent 7734b7532a
commit aec8f348e9

View file

@ -10,6 +10,24 @@ updated_at: 2026-06-08
---
## Summary
Make the autoplay gate surface able to run a `learned:*` controller in a slot and emit canonical autoplay-schema `turn_stats.jsonl`, so the AI-quality gates (p1-29g) can score trained-vs-scripted.
## STATUS — 2026-06-08 — BLOCKED (architecture mismatch; STOP-WAIT, owner decision required)
Investigated end-to-end by simulator-infra (no code landed — Blocker Protocol). **The objective as written cannot be honestly implemented**: the autoplay world and the player-API/bench world are **disjoint state representations**, and the learned policy requires the player-API world's representation.
- **Autoplay world** (`scenes/tests/auto_play.gd``turn_manager.gd:191``ai_turn_bridge.gd:213 _apply_tactical_actions`): the authoritative full game state lives in **GDScript entities** (`GameState.players[].units/.cities`, the `WorldMap` grid). Every AI turn is unconditionally heuristic (`GdAiController.decide_actions``mc_ai::tactical::run_tactical`). **There is no controller-registry dispatch here at all.**
- **Player-API world** (`scenes/headless/player_api_main.gd` + `mc_player_api::dispatch`): holds ONE authoritative Rust `mc_state::game_state::GameState`; `drive_ai_slot` (`dispatch.rs:979`) routes `learned:*``drive_learned_slot` (`dispatch.rs:1056`) → the faithful `compute_vision → project_view → encode → ONNX → decode → apply_action → re-project` loop. This is where p1-29f's smoke ran.
- `drive_learned_slot` needs a faithful `&mut GameState` to re-observe after each action. The autoplay world has none. The only Rust state it can mint is `ai_turn_bridge_state.gd::build_mc_tree_state` — a **lossy strategic-directive stub** (`"grid": null`, hardcoded `food_yield/prod_yield`, no `units_catalog`/`ai_unit_catalog`/`ai_building_catalog` — those are `#[serde(skip)]` on `GameState`). Feeding it to the policy ⇒ empty `PlayerView` + empty legal-action mask ⇒ silent `EndTurn` every turn. That is the forbidden lossy approximation (Commandment 5).
- Neither the long-lived `GdGameState` mirror (only mirrors npc_buildings/combat_balance/city_items in the live world) nor the per-turn `.save` (GDScript-native envelope, no grid, not a Rust-`GameState` JSON) rescues this.
**Compounding finding (load-bearing for p1-29g):** p1-29g's scripted baseline — and every prior Game-1 AI gate (p1-29c/d/a/b, p0-24) — was measured on the **full-engine autoplay surface, scripted-vs-scripted**. The learned policy is a **bench-world artifact** (trained against the player-API env, encoder's fixed 16-item roster + bench legal-action projection). Bench-world trained numbers are **not apples-to-apples** with full-engine autoplay scripted baselines — so even the only viable engineering path (Design B below) may not actually unblock p1-29g.
**Resolution paths (owner / warcouncil decision):**
1. **Design B — bench-side emitter:** keep faithful learned play in the player-API/bench world (where p1-29f already runs apples-to-apples vs scripted on the same bench), and add an autoplay-schema `turn_stats.jsonl` emitter reading the Rust `GameState` each turn so `sole-city-gate.py`/`p1-survival-score.py` score it. Reuses `drive_learned_slot`, Rail-1-clean, lives under the player-API harness driven from `autoplay-batch.sh` (bullet-1's "auto_play.gd **or its batch harness**" permits this). **Caveat: may not match the existing autoplay baseline (see above).**
2. **Full-engine Rust `GameState` serializer:** build a faithful GDScript→Rust per-turn serializer (grid + real yields + catalogs + vision) so the policy runs on the autoplay surface. Large; and the artifact would need retrain/validate against full-engine observations.
3. **Re-route to the RL/architecture track** + first answer the p1-29g baseline-provenance question (rebuild the scripted baseline in the bench world so trained and scripted are both bench-measured).
## Why this exists
p1-29f shipped the `learned:duel-v4-encfix-s7` controller bridge, but scoped it to the **player-API dispatch world only** (`mc_player_api::apply_action`). Per p1-29f's own bullet-3/5 verification caveats: the learned controller "runs only in the `mc_player_api` dispatch world… `auto_play.gd`'s `GdAiController` path can't [host one], since the learned controller's one-shot `decide_turn` is identity-only by design," and its output is "player-API-native per-turn JSONL, not `auto_play.gd`'s `autoplay-result-schema.json` shape."
@ -50,7 +68,5 @@ Filed by warcouncil (owns p1-29g and the AI-quality cluster) and assigned to **s
- `src/game/engine/scenes/tests/auto_play.gd` — the autoplay driver lacking a controller hook.
- `tools/autoplay-batch.sh` — exposes only seed/personality envs, no controller selection.
- `src/simulator/api-gdext/src/ai.rs:40-52``run_ai_turn` heuristic driver (the GdAiController path).
## Acceptance
- ❌ Define acceptance criteria.
- `src/game/engine/src/modules/ai/ai_turn_bridge_state.gd``build_mc_tree_state`, the lossy stub that cannot host the learned loop.
- `src/simulator/crates/mc-player-api/src/dispatch.rs``drive_learned_slot` (the faithful loop) + `projection.rs` catalog/grid deps.