magicciv/src
Natalie 78945e9df1 feat(sim): make the headless fullgame runner exercise tech/trade/culture for real
The sim_scenario fullgame driver stepped the turn loop but never boot-loaded
the content packs the live harness loads, so process_science ran research-less
(tier-1 fallback) and process_trade_phase saw no resource categories — the
strategic systems were inert. The four strategic assertions (median_tier_peak,
trades_formed, border_growth, clan_winrate) were therefore skipped, leaving
trade_forms / time_to_tier / culture_borders_expand / clan_fairness_band
vacuously green (passing on `terminates` alone).

This wires the systems for real and measures them:

- drive_fullgame boot-loads the tech web (concatenated public/resources/techs/
  *.json) and the resource→category map (public/resources/resources.json), the
  same payloads GdPlayerApi feeds set_tech_web_json / set_resource_categories_json.
  Now: median tier reaches 10, trades form, culture borders expand for real, and
  outcomes vary by seed (previously combat/founding were terrain-blind).
- Extract real metrics: tier_peak_p{i} + median_tier_peak (max tier among a
  player's researched techs), trades_formed (traded luxuries+strategics),
  owned_tiles_p{i} (culture-claimed territory), and the per-seed winner.
- Un-skip MedianTierPeak / TradesFormed / BorderGrowth — they evaluate against
  the run. ClanWinrateMax is wired as a batch-level assertion (win fraction of
  the most-winning clan across the seed set) with the measured value surfaced in
  the JSON output.
- Strengthen the game1_headless_systems_150t umbrella with median_tier_peak>=4
  and trades_formed>=1, and re-calibrate final_turn 120->90: a winner now emerges
  ~98-113t once the systems actually drive the game, instead of running flat to
  the cap (calibration-rule: lock the threshold to the real all-systems run).

Determinism fix: PlayerTechState.researched (HashSet) now serializes sorted, so
GameState serialization — and the determinism_same_seed end_state_hash check —
is stable run-to-run regardless of hash iteration order. The set has no
meaningful order; the in-memory type and researched_techs() accessor are
unchanged.

Full suite: 19/20 green. clan_fairness_band is the single honest FAIL — over 50
seeds / 6 clans only 3 ever win (winrates 0.14 / 0.46 / 0.40; clans 1,2,3 never
win), max 0.46 > the 0.4 band. That is a real fairness gap from the bench's fixed
asymmetric start positions + personality balance — surfaced, not tuned away
(owner decision).

Verified: cargo test -p mc-tech (28 passed); full sim_scenario suite run locally
on plum (release), determinism + canonical + the three strategic scenarios green
on real metrics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-28 23:20:13 -04:00
..
game fix(tests): mark wild-creature-ai private _method tests pending after Rail-1 Rust port 2026-06-28 12:29:02 -04:00
packages
simulator feat(sim): make the headless fullgame runner exercise tech/trade/culture for real 2026-06-28 23:20:13 -04:00