magicciv/.project/history/20260416_gpu_recon.md
Natalie aaa359e2c5 feat(@projects): add project objectives roadmap
Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-04-17 00:14:17 -07:00

6 KiB
Raw Blame History

GPU RECON Phase B — WGPU Compute Portability for mc-turn

1. Data Read/Written During TurnProcessor::step

POD (shader-friendly)

Type Location Fields
MapUnit game_state.rs:73 col, row, hp, max_hp, attack, defense, is_fortified — 7× i32/bool, ~28 B
CityState (bench subset) mc-city::CityState population, food_stored, food_yield, prod_yield, production_stored — all i32
CityEcology game_state.rs:63 adjacent_lair_pressure: f32, last_harassment_turn: u32 — 8 B
UnitStats resolver.rs:11 7× i32 — 28 B
CombatBonuses bonuses.rs ~6× f32, 1× i32 — 28 B
Lair snapshot processor.rs:457 Vec<(i32, i32, i32)> — pure POD
LairIndex.buckets spatial_index.rs:57 Vec<Vec<u32>> — nested alloc, flattens to flat u32[]

Shader-hostile (graph/pointer/String)

Type Location Problem
TechState.progress game_state.rs:89 HashMap<String, u32> — heap, non-deterministic layout
PlayerState.strategic_axes game_state.rs:36 HashMap<String, u8>
PlayerState.city_buildings game_state.rs:43 Vec<Vec<String>>
MapUnit.unit_id game_state.rs:82 String
TileState.biome_id et al. grid/mod.rs:86 ~8 String fields per tile
TileState.river_flow grid/mod.rs:98 HashMap<String, f32>
CombatParams.attacker_keywords resolver.rs:185 Vec<Keyword> (enum vec)
TurnProcessor.building_upkeep_table processor.rs:149 HashMap<String, i32>

POD fraction: Core bench loop (economy, city prod, unit move, fauna encounter) touches ~95% POD. The HashMap<String,*> fields are queried at most once per turn per player for axis lookups — they can be pre-flattened to arrays before GPU dispatch.

2. WGSL Kernel Candidates

Kernel Input Output LOC est. Notes
economy_tick player_gold[], wealth_axis[], city_count[], upkeep[] new_gold[] ~60 Trivial scalar arithmetic per player. Parallelism: N_players (tiny). Worth only as warm-up.
city_production city_food[], city_pop[], city_prod[] updated arrays ~100 One city per workgroup invocation. Threshold formula is deterministic.
culture_science culture_axis, city_count culture_total, science_yield ~40 Scalar per player — near-zero GPU benefit, include for completeness.
fauna_encounter unit_pos[], lair_buckets[], lair_tiers[], rng_state[] kill_flags[] ~200 Best candidate: O(units × avg_lairs_per_tile), embarrassingly parallel per unit. RNG must be SplitMix64 (already used in Rust).
combat_resolve attacker_stats[], defender_stats[], bonuses[] dmg_to_def, dmg_to_atk ~150 Civ5 exponential formula (e^(diff/25)). No branches except keyword flags. Parallelism: N_combats per turn.
unit_movement unit_pos[], enemy_unit_pos[], enemy_city_pos[], lair_pos[] new_pos[] ~180 step_toward is Manhattan step — trivial. Bottleneck is nearest-neighbor search; a flat sorted array with binary search works on GPU.

Total WGSL LOC estimate: ~730

3. Structural Blockers

  1. String keys in hot pathstrategic_axes: HashMap<String, u8> is looked up every phase. Pre-encode to [u8; 8] (axis enum index) before upload. processor.rs:262.
  2. Nested Vec allocationsLairIndex.buckets: Vec<Vec<u32>> (spatial_index.rs:57) must be flattened to a (flat_u32_buf, offset_u32_buf) CSR layout for WGPU buffer upload.
  3. RNG stream order — encounter resolution is byte-identical only when lairs visit in ascending snapshot order (spatial_index.rs:22-28). WGPU workgroups must preserve per-player RNG lanes (one SplitMix64 state per player-lane, not per-unit).
  4. Keyword VecCombatParams.attacker_keywords: Vec<Keyword> must become a u32 bitmask. Already ~15 enum variants, fits in one u32.
  5. GridState.TileState has 50+ fields (grid/mod.rs:79-188); uploading the full grid per MCTS rollout is ~20 MB for a 96×96 map. Only the lair sub-fields (lair_tier, lair_population, col, row) are needed — project to a slim GpuLair struct before upload.

4. Phased Implementation Plan

Phase B1 — Flat-data layer (prereq, ~1 week)

  • Add GpuPlayerState { gold: i32, axes: [u8;8], city_count: u32, ... } alongside existing structs
  • Flatten LairIndex to CSR buffers
  • Encode keywords as bitmask; encode strategic_axes as fixed enum array
  • No WGSL yet; just establish the serialization contract

Phase B2 — fauna_encounter kernel (~1 week)

  • Port process_fauna_encounters_inner inner loop to WGSL
  • One workgroup invocation per (player, unit); reads from CSR lair index
  • Validate byte-identical kill flags vs Rust reference on known seeds

Phase B3 — combat_resolve kernel (~3 days)

  • Port CombatResolver::resolve Civ5 formula to WGSL
  • Single dispatch over combats array; no branching beyond keyword bitmask checks

Phase B4 — unit_movement + economy (~1 week)

  • step_toward nearest-enemy search → GPU nearest-neighbor over flat arrays
  • Economy and city-production trivially vectorize but have low parallelism gain

Phase B5 — MCTS rollout dispatch

  • Wrap Phase B1-B4 kernels into a single advance_n_futures(states: &[GpuGameState]) entry point
  • Dispatch N rollout states in one WGPU command buffer; read back winner arrays

Total Phase B wall-clock estimate: 45 weeks (assuming one engineer, Rust+WGPU experience required).

5. Verdict

The bench loop is ~95% POD-compatible. The primary porting work is data-marshaling (String→int, Vec→CSR, keyword→bitmask), not algorithmic. The fauna_encounter and combat_resolve kernels are the highest-value targets: O(units × lairs) and O(combats) respectively, both embarrassingly parallel. The main risk is RNG determinism across workgroup execution order — Phase B2 must validate byte-identical output before any further kernel work proceeds.