Natalie aaa359e2c5 feat(@projects): ✨ add project objectives roadmap

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>

2026-04-17 00:14:17 -07:00

6 KiB

Raw Blame History

GPU RECON Phase B — WGPU Compute Portability for mc-turn

1. Data Read/Written During TurnProcessor::step

POD (shader-friendly)

Type	Location	Fields
`MapUnit`	`game_state.rs:73`	`col, row, hp, max_hp, attack, defense, is_fortified` — 7× i32/bool, ~28 B
`CityState` (bench subset)	`mc-city::CityState`	`population, food_stored, food_yield, prod_yield, production_stored` — all i32
`CityEcology`	`game_state.rs:63`	`adjacent_lair_pressure: f32, last_harassment_turn: u32` — 8 B
`UnitStats`	`resolver.rs:11`	7× i32 — 28 B
`CombatBonuses`	`bonuses.rs`	~6× f32, 1× i32 — 28 B
Lair snapshot	`processor.rs:457`	`Vec<(i32, i32, i32)>` — pure POD
`LairIndex.buckets`	`spatial_index.rs:57`	`Vec<Vec<u32>>` — nested alloc, flattens to flat `u32[]`

Shader-hostile (graph/pointer/String)

Type	Location	Problem
`TechState.progress`	`game_state.rs:89`	`HashMap<String, u32>` — heap, non-deterministic layout
`PlayerState.strategic_axes`	`game_state.rs:36`	`HashMap<String, u8>`
`PlayerState.city_buildings`	`game_state.rs:43`	`Vec<Vec<String>>`
`MapUnit.unit_id`	`game_state.rs:82`	`String`
`TileState.biome_id` et al.	`grid/mod.rs:86`	~8 `String` fields per tile
`TileState.river_flow`	`grid/mod.rs:98`	`HashMap<String, f32>`
`CombatParams.attacker_keywords`	`resolver.rs:185`	`Vec<Keyword>` (enum vec)
`TurnProcessor.building_upkeep_table`	`processor.rs:149`	`HashMap<String, i32>`

POD fraction: Core bench loop (economy, city prod, unit move, fauna encounter) touches ~95% POD. The HashMap<String,*> fields are queried at most once per turn per player for axis lookups — they can be pre-flattened to arrays before GPU dispatch.

2. WGSL Kernel Candidates

Kernel	Input	Output	LOC est.	Notes
`economy_tick`	player_gold[], wealth_axis[], city_count[], upkeep[]	new_gold[]	~60	Trivial scalar arithmetic per player. Parallelism: N_players (tiny). Worth only as warm-up.
`city_production`	city_food[], city_pop[], city_prod[]	updated arrays	~100	One city per workgroup invocation. Threshold formula is deterministic.
`culture_science`	culture_axis, city_count	culture_total, science_yield	~40	Scalar per player — near-zero GPU benefit, include for completeness.
`fauna_encounter`	unit_pos[], lair_buckets[], lair_tiers[], rng_state[]	kill_flags[]	~200	Best candidate: O(units × avg_lairs_per_tile), embarrassingly parallel per unit. RNG must be SplitMix64 (already used in Rust).
`combat_resolve`	attacker_stats[], defender_stats[], bonuses[]	dmg_to_def, dmg_to_atk	~150	Civ5 exponential formula (`e^(diff/25)`). No branches except keyword flags. Parallelism: N_combats per turn.
`unit_movement`	unit_pos[], enemy_unit_pos[], enemy_city_pos[], lair_pos[]	new_pos[]	~180	`step_toward` is Manhattan step — trivial. Bottleneck is nearest-neighbor search; a flat sorted array with binary search works on GPU.

Total WGSL LOC estimate: ~730

3. Structural Blockers

String keys in hot path — strategic_axes: HashMap<String, u8> is looked up every phase. Pre-encode to [u8; 8] (axis enum index) before upload. processor.rs:262.
Nested Vec allocations — LairIndex.buckets: Vec<Vec<u32>> (spatial_index.rs:57) must be flattened to a (flat_u32_buf, offset_u32_buf) CSR layout for WGPU buffer upload.
RNG stream order — encounter resolution is byte-identical only when lairs visit in ascending snapshot order (spatial_index.rs:22-28). WGPU workgroups must preserve per-player RNG lanes (one SplitMix64 state per player-lane, not per-unit).
Keyword Vec — CombatParams.attacker_keywords: Vec<Keyword> must become a u32 bitmask. Already ~15 enum variants, fits in one u32.
GridState.TileState has 50+ fields (grid/mod.rs:79-188); uploading the full grid per MCTS rollout is ~20 MB for a 96×96 map. Only the lair sub-fields (lair_tier, lair_population, col, row) are needed — project to a slim GpuLair struct before upload.

4. Phased Implementation Plan

Phase B1 — Flat-data layer (prereq, ~1 week)

Add GpuPlayerState { gold: i32, axes: [u8;8], city_count: u32, ... } alongside existing structs
Flatten LairIndex to CSR buffers
Encode keywords as bitmask; encode strategic_axes as fixed enum array
No WGSL yet; just establish the serialization contract

Phase B2 — fauna_encounter kernel (~1 week)

Port process_fauna_encounters_inner inner loop to WGSL
One workgroup invocation per (player, unit); reads from CSR lair index
Validate byte-identical kill flags vs Rust reference on known seeds

Phase B3 — combat_resolve kernel (~3 days)

Port CombatResolver::resolve Civ5 formula to WGSL
Single dispatch over combats array; no branching beyond keyword bitmask checks

Phase B4 — unit_movement + economy (~1 week)

step_toward nearest-enemy search → GPU nearest-neighbor over flat arrays
Economy and city-production trivially vectorize but have low parallelism gain

Phase B5 — MCTS rollout dispatch

Wrap Phase B1-B4 kernels into a single advance_n_futures(states: &[GpuGameState]) entry point
Dispatch N rollout states in one WGPU command buffer; read back winner arrays

Total Phase B wall-clock estimate: 4–5 weeks (assuming one engineer, Rust+WGPU experience required).

5. Verdict

The bench loop is ~95% POD-compatible. The primary porting work is data-marshaling (String→int, Vec→CSR, keyword→bitmask), not algorithmic. The fauna_encounter and combat_resolve kernels are the highest-value targets: O(units × lairs) and O(combats) respectively, both embarrassingly parallel. The main risk is RNG determinism across workgroup execution order — Phase B2 must validate byte-identical output before any further kernel work proceeds.

6 KiB Raw Blame History Unescape Escape