6 KiB
GPU RECON Phase B — WGPU Compute Portability for mc-turn
1. Data Read/Written During TurnProcessor::step
POD (shader-friendly)
| Type | Location | Fields |
|---|---|---|
MapUnit |
game_state.rs:73 |
col, row, hp, max_hp, attack, defense, is_fortified — 7× i32/bool, ~28 B |
CityState (bench subset) |
mc-city::CityState |
population, food_stored, food_yield, prod_yield, production_stored — all i32 |
CityEcology |
game_state.rs:63 |
adjacent_lair_pressure: f32, last_harassment_turn: u32 — 8 B |
UnitStats |
resolver.rs:11 |
7× i32 — 28 B |
CombatBonuses |
bonuses.rs |
~6× f32, 1× i32 — 28 B |
| Lair snapshot | processor.rs:457 |
Vec<(i32, i32, i32)> — pure POD |
LairIndex.buckets |
spatial_index.rs:57 |
Vec<Vec<u32>> — nested alloc, flattens to flat u32[] |
Shader-hostile (graph/pointer/String)
| Type | Location | Problem |
|---|---|---|
TechState.progress |
game_state.rs:89 |
HashMap<String, u32> — heap, non-deterministic layout |
PlayerState.strategic_axes |
game_state.rs:36 |
HashMap<String, u8> |
PlayerState.city_buildings |
game_state.rs:43 |
Vec<Vec<String>> |
MapUnit.unit_id |
game_state.rs:82 |
String |
TileState.biome_id et al. |
grid/mod.rs:86 |
~8 String fields per tile |
TileState.river_flow |
grid/mod.rs:98 |
HashMap<String, f32> |
CombatParams.attacker_keywords |
resolver.rs:185 |
Vec<Keyword> (enum vec) |
TurnProcessor.building_upkeep_table |
processor.rs:149 |
HashMap<String, i32> |
POD fraction: Core bench loop (economy, city prod, unit move, fauna encounter) touches ~95% POD. The HashMap<String,*> fields are queried at most once per turn per player for axis lookups — they can be pre-flattened to arrays before GPU dispatch.
2. WGSL Kernel Candidates
| Kernel | Input | Output | LOC est. | Notes |
|---|---|---|---|---|
economy_tick |
player_gold[], wealth_axis[], city_count[], upkeep[] | new_gold[] | ~60 | Trivial scalar arithmetic per player. Parallelism: N_players (tiny). Worth only as warm-up. |
city_production |
city_food[], city_pop[], city_prod[] | updated arrays | ~100 | One city per workgroup invocation. Threshold formula is deterministic. |
culture_science |
culture_axis, city_count | culture_total, science_yield | ~40 | Scalar per player — near-zero GPU benefit, include for completeness. |
fauna_encounter |
unit_pos[], lair_buckets[], lair_tiers[], rng_state[] | kill_flags[] | ~200 | Best candidate: O(units × avg_lairs_per_tile), embarrassingly parallel per unit. RNG must be SplitMix64 (already used in Rust). |
combat_resolve |
attacker_stats[], defender_stats[], bonuses[] | dmg_to_def, dmg_to_atk | ~150 | Civ5 exponential formula (e^(diff/25)). No branches except keyword flags. Parallelism: N_combats per turn. |
unit_movement |
unit_pos[], enemy_unit_pos[], enemy_city_pos[], lair_pos[] | new_pos[] | ~180 | step_toward is Manhattan step — trivial. Bottleneck is nearest-neighbor search; a flat sorted array with binary search works on GPU. |
Total WGSL LOC estimate: ~730
3. Structural Blockers
- String keys in hot path —
strategic_axes: HashMap<String, u8>is looked up every phase. Pre-encode to[u8; 8](axis enum index) before upload.processor.rs:262. - Nested Vec allocations —
LairIndex.buckets: Vec<Vec<u32>>(spatial_index.rs:57) must be flattened to a(flat_u32_buf, offset_u32_buf)CSR layout for WGPU buffer upload. - RNG stream order — encounter resolution is byte-identical only when lairs visit in ascending snapshot order (
spatial_index.rs:22-28). WGPU workgroups must preserve per-player RNG lanes (one SplitMix64 state per player-lane, not per-unit). - Keyword Vec —
CombatParams.attacker_keywords: Vec<Keyword>must become au32bitmask. Already ~15 enum variants, fits in oneu32. - GridState.TileState has 50+ fields (
grid/mod.rs:79-188); uploading the full grid per MCTS rollout is ~20 MB for a 96×96 map. Only the lair sub-fields (lair_tier, lair_population, col, row) are needed — project to a slimGpuLairstruct before upload.
4. Phased Implementation Plan
Phase B1 — Flat-data layer (prereq, ~1 week)
- Add
GpuPlayerState { gold: i32, axes: [u8;8], city_count: u32, ... }alongside existing structs - Flatten
LairIndexto CSR buffers - Encode keywords as bitmask; encode
strategic_axesas fixed enum array - No WGSL yet; just establish the serialization contract
Phase B2 — fauna_encounter kernel (~1 week)
- Port
process_fauna_encounters_innerinner loop to WGSL - One workgroup invocation per (player, unit); reads from CSR lair index
- Validate byte-identical kill flags vs Rust reference on known seeds
Phase B3 — combat_resolve kernel (~3 days)
- Port
CombatResolver::resolveCiv5 formula to WGSL - Single dispatch over combats array; no branching beyond keyword bitmask checks
Phase B4 — unit_movement + economy (~1 week)
step_towardnearest-enemy search → GPU nearest-neighbor over flat arrays- Economy and city-production trivially vectorize but have low parallelism gain
Phase B5 — MCTS rollout dispatch
- Wrap Phase B1-B4 kernels into a single
advance_n_futures(states: &[GpuGameState])entry point - Dispatch
Nrollout states in one WGPU command buffer; read back winner arrays
Total Phase B wall-clock estimate: 4–5 weeks (assuming one engineer, Rust+WGPU experience required).
5. Verdict
The bench loop is ~95% POD-compatible. The primary porting work is data-marshaling (String→int, Vec→CSR, keyword→bitmask), not algorithmic. The fauna_encounter and combat_resolve kernels are the highest-value targets: O(units × lairs) and O(combats) respectively, both embarrassingly parallel. The main risk is RNG determinism across workgroup execution order — Phase B2 must validate byte-identical output before any further kernel work proceeds.