feat(@projects): ✨ add compute profiling layer for dev debugging

Co-Authored-By: Lilith Autocommit <noreply@atlilith.com>
2026-06-10 04:26:34 -07:00 · 2026-06-10 04:26:34 -07:00 · 20de41a246
commit 20de41a246
parent 31f88a2e95
9 changed files with 1006 additions and 60 deletions
--- a/.project/designs/p2-84-dev-compute-profiling-design.md
+++ b/.project/designs/p2-84-dev-compute-profiling-design.md
@ -0,0 +1,538 @@
+# Engineering design — dev-only compute profiling, trigger-attributed, zero-cost in release (`p2-84`)
+
+> **Status: DESIGN.** Build-ready plan, not implementation. No code was edited.
+> Modeled on `.project/designs/p2-76-79-terraforming-cascade-design.md`.
+>
+> **Scope:** objective `.project/objectives/p2-84.md` (`status: missing`,
+> `scope: game1`, owner `simulator-infra`). A development-time profiling layer
+> that attributes CPU / RAM / GPU cost to the **feature** that incurred it,
+> over game-time, tagged by the **trigger** — and compiles to **nothing** in
+> release builds: no span, no branch, no atomic, no symbol.
+>
+> **Concurrency caveat:** `mc-worldsim`, `mc-state`, `mc-mapgen`, and
+> `api-gdext` are being modified by another agent (p2-76…79 terraforming
+> cascade). The attribution span *names* below cover the cascade's new
+> sub-steps (1b/4b — already landed) and the in-flight p2-78 `resolve_local`
+> (not yet present in this checkout); re-verify the seam list at build time.
+
+---
+
+## 0. The one architectural idea
+
+**The attribution boundaries already exist — the codebase is organized as
+named, ordered sub-steps; profiling just has to name them.**
+`WorldSim::step` is a numbered sequence (1, 1b, 2, 3, 3b, 4, 4b —
+`mc-worldsim/src/lib.rs:190-260`); `TurnProcessor::step` is a commented phase
+chain (Phase 0 trade → Phase 1-4 per-player → Phase 5a movement … Phase 7
+victory — `mc-turn/src/processor.rs:392-652`); the AI handoff is already
+hand-timed in GDScript (`ai_turn_bridge.gd:602`, `:669-674`); the telemetry
+pipeline that carries per-turn JSONL already exists (`turn_stats.jsonl` /
+`events.jsonl`, written by `auto_play.gd::_append_turn_stats`,
+`scenes/tests/auto_play.gd:2653-2731`; analysis family under `tools/`:
+`autoplay-report.py`, `measure-turn-latency.py`, `batch-walltime.sh`).
+
+So the design is: **one tiny facade crate (`mc-profiling`) whose entire public
+surface compiles to zero-sized no-ops when its `enabled` feature is off**,
+span macros dropped onto the seams that already exist, a thread-local →
+per-turn-drain sink, one new JSONL stream riding the existing telemetry
+pipeline, and one report tool. No new measurement *concepts* — a
+generalization of the `tactical_state_build_ms_p99` pattern that p1-30
+already proved, exactly as the objective's "reuse, don't fork" note demands.
+
+The load-bearing constraint is restated up front: **feature OFF is the
+shipped default, and OFF must compile the instrumentation out entirely** —
+verified by symbol absence and benchmark parity, not by assertion (§2.2, §6).
+
+---
+
+## 1. What exists vs what is genuinely new
+
+| Concern | State | Evidence |
+|---|---|---|
+| Per-turn JSONL telemetry pipeline + game-dir layout | **EXISTS** | `auto_play.gd:10-11` (`turn_stats.jsonl`, `events.jsonl`); appended per turn at `:2721-2731` |
+| Per-phase wall-clock fields (the pattern to generalize) | **EXISTS** | `tactical_state_build_ms_p99` / `mcts_dispatch_ms_p99` measured via `Time.get_ticks_usec()` in `ai_turn_bridge.gd:602,669-674`, p99'd at `:70-99`, written into turn_stats at `:2717-2718` |
+| Named attribution seams in the sim | **EXISTS** | `WorldSim::step` sub-steps (`lib.rs:190-260`); `TurnProcessor::step` phases (`processor.rs:392-652`); `apply_end_turn` comms passes (`mc-player-api/src/dispatch.rs:416-431`); ecology engine sub-passes (`mc-ecology/src/engine.rs:299-392` — emergence, LV tick, dispersal, succession, fish, seed-dispersal) |
+| Cargo-feature gating precedent | **EXISTS** | `gpu` / `parallel` / `cpu` features in `mc-compute/Cargo.toml:6-10`, `mc-turn`, `mc-ecology`, `mc-worldsim`, `mc-ai` |
+| GPU paths to instrument | **EXISTS** | GPU ecology behind `FORCE_GPU_ECOLOGY` (`mc-ecology/src/engine.rs:311`, dispatches into `mc-compute/src/ecology.rs`); GPU MCTS rollouts (`mc-ai/src/gpu/`); wall-time test precedent `mc-ai/tests/gpu_walltime.rs` |
+| Analysis tooling family to extend | **EXISTS** | `tools/autoplay-report.py`, `tools/measure-turn-latency.py`, `tools/batch-walltime.sh`, `tools/run-benches.sh`, `tools/batch-quality-metrics.sh` |
+| Dev/prod env gating GDScript-side | **EXISTS** | `EnvConfig` autoload (`env_config.gd`) — the `AI_ARENA` / `RUST_FAUNA_ENCOUNTERS` flag pattern; `ClassDB.class_exists` guard pattern (`worldsim_state.gd:39`) for optional Rust classes |
+| CPU calibration (NOT instrumentation) | **EXISTS** | `mc_core::perf::optimal_thread_count` (`mc-core/src/perf.rs`) — rayon sizing only; unrelated machinery, do not conflate |
+| Structured tracing framework | **ABSENT** | no `tracing`/`puffin`/`tracy` anywhere in `src/simulator/crates` (grep-verified); timings are ad-hoc `Instant::now` in benches/tests |
+| **`mc-profiling` facade crate + zero-cost span API** | **NEW** | nothing exists |
+| **RAM attribution (counting allocator / RSS sampling)** | **NEW** | nothing exists |
+| **GPU timestamp capture** | **NEW** | only wall-clock around dispatch exists |
+| **Trigger tagging (turn / phase / slot / cause)** | **NEW** | turn-only today |
+| **`profiling.jsonl` stream + `tools/profiling-report.py`** | **NEW** | turn_stats carries 2 fixed AI fields only |
+
+**Net-new engines: zero.** One facade crate, macro call-sites on existing
+seams, one allocator wrapper, one JSONL writer, one Python report tool.
+
+---
+
+## 2. The facade crate — `mc-profiling`
+
+### 2.1 Crate layout and feature topology
+
+New crate `src/simulator/crates/mc-profiling`:
+
+- **Dependencies: none** in the off configuration; `serde`/`serde_json` only
+  under `enabled` (keeps the always-linked footprint nil).
+- One internal feature: `enabled`. The crate **always compiles** as a
+  dependency of instrumented crates; `enabled` decides whether its items have
+  bodies.
+- Each instrumented crate (`mc-worldsim`, `mc-turn`, `mc-ai`,
+  `mc-pathfinding`, `mc-save`, `mc-player-api`, `api-gdext`) takes
+  `mc-profiling = { path = "../mc-profiling" }` as a **required** dep and
+  declares:
+  ```toml
+  [features]
+  profiling = ["mc-profiling/enabled"]
+  ```
+  Cargo feature unification then gives one switch:
+  `cargo build -p api-gdext --features profiling` lights the whole tree;
+  the default/release build leaves `enabled` off everywhere. The feature name
+  is **`profiling`** exactly as the objective's acceptance specifies.
+- `mc-core` is NOT instrumented (it must not gain even an always-compiled
+  dep; nothing in it is a per-turn hot seam — hex math costs attribute to
+  callers).
+
+### 2.2 The zero-cost mechanism (the load-bearing part)
+
+The naive approach — `#[cfg(feature)]` inside a `macro_rules!` expansion —
+is wrong: the `cfg` would be evaluated against the *consumer* crate's
+features. Instead, the **items** are cfg-gated inside `mc-profiling`, and the
+macro expansion is identical in both modes:
+
+```rust
+// mc-profiling/src/lib.rs
+#[macro_export]
+macro_rules! span {
+    ($name:literal $(, $key:literal = $val:expr)* $(,)?) => {
+        let _mc_span = $crate::Span::enter($name, &[$(($key, ($val) as i64)),*]);
+    };
+}
+
+#[cfg(feature = "enabled")]
+pub struct Span { /* name, tags, start: Instant, alloc_mark: AllocMark */ }
+#[cfg(feature = "enabled")]
+impl Span {
+    #[inline]
+    pub fn enter(name: &'static str, tags: &[(&'static str, i64)]) -> Self { /* record */ }
+}
+#[cfg(feature = "enabled")]
+impl Drop for Span { fn drop(&mut self) { /* push SpanRecord to thread-local sink */ } }
+
+#[cfg(not(feature = "enabled"))]
+pub struct Span;                       // ZST
+#[cfg(not(feature = "enabled"))]
+impl Span {
+    #[inline(always)]
+    pub const fn enter(_: &'static str, _: &[(&'static str, i64)]) -> Self { Span }
+}
+// no Drop impl in the off configuration — a ZST with no Drop is fully erased.
+```
+
+Off-mode guarantees, in order of strength:
+1. `Span` is a ZST with no `Drop` and `enter` is a `const fn` with an empty
+   body → the optimizer erases the call, the binding, and the argument
+   construction (`&'static str` literals and the tag slice are consts). **No
+   branch, no atomic, no allocation, no symbol.**
+2. **Verified, not assumed** (§6): (a) a symbol/strings check on the release
+   `api-gdext` cdylib asserting no span-name literals or `mc_profiling`
+   symbols survive; (b) a codegen smoke test (`#[no_mangle]` probe fn
+   compiled both ways, asm-compared or size-compared); (c) bench parity —
+   feature-off build vs the pre-p2-84 baseline commit within noise on
+   `tools/run-benches.sh`.
+3. The same erasure argument covers tag-expression evaluation: tag values
+   must be cheap field reads at call sites (lint rule in review: **never call
+   a function to compute a tag** — if a tag needs computing, compute it
+   inside the span body cfg'd code, not at the macro site). This is the one
+   discipline the compiler can't enforce; the bench-parity gate backstops it.
+
+The allocator hook (§3.2) is gated the same way and additionally only
+*installed* by profiling-enabled artifacts — a release build contains no
+`#[global_allocator]` override at all.
+
+### 2.3 The sink
+
+- **Thread-local `Vec<SpanRecord>`** per thread (rayon workers + the
+  speculation thread from p2-83 each accumulate locally — no contention on
+  the hot path; the only synchronization is registration of thread buffers
+  in a global registry `Mutex` touched once per thread lifetime).
+- `SpanRecord { name: &'static str, tags: SmallVec<(&'static str, i64)>,
+  wall_ns: u64, thread: u16, alloc_delta_bytes: i64, alloc_peak_bytes: u64 }`.
+- **Per-turn drain:** `mc_profiling::drain_turn(turn: u32) -> String` —
+  called at the turn boundary (the `RoundEnd` seam once p2-83 lands; until
+  then, the tail of `WorldSim::step` / `apply_end_turn` and the GDScript
+  `turn_ended` hook). Merges all thread buffers, sorts records by
+  `(name, tags)` (deterministic field + record order — `BTreeMap` aggregation,
+  serde struct field order fixed), serializes one JSONL line.
+- Aggregation per line: per span-name `{ calls, total_ms, max_ms,
+  total_alloc_bytes, peak_alloc_bytes }` plus the tag dimensions. Raw
+  per-call records are capped (default: aggregate-only; a
+  `MC_PROFILING_RAW=1` escape hatch keeps individual records for
+  chrome-trace export, ring-buffered to the last 64 turns to bound memory).
+
+### 2.4 Trigger tagging
+
+Every span carries, via tags resolved at the call site:
+
+| Tag | Source | Notes |
+|---|---|---|
+| `turn` | `state.turn` | always present (stamped at drain, not per span) |
+| `phase` | p2-83 `RoundPhase` ordinal when available | until p2-83 lands: a coarse static tag (`"turn_step"`, `"world_round"`) — the design degrades gracefully; the objective explicitly says "when available" |
+| `slot` | player index for per-player work (AI spans, per-slot processing) | `-1` for world-scoped work |
+| `trigger` | static interned cause string | `"player_action:build_improvement"`, `"world_event:earthquake"`, `"ai:mcts_dispatch"`, `"terraform:contamination_tick"`, `"save:autosave"` — the "triggered by what" axis; event-driven spans (world events, terraform 1b, p2-78 `resolve_local`) tag the event kind they're servicing |
+
+The terraforming-cascade design §7.4 already requested exactly this:
+"instrument 1b/4b/`resolve_local` as named, trigger-attributed cost spans" —
+those three are in the Increment-1 span list below.
+
+---
+
+## 3. The three resource axes
+
+### 3.1 CPU
+
+- Wall time per span via `Instant::now()` (monotonic; profiling builds are
+  dev-only so the ~20-30ns clock cost per span is acceptable — span count is
+  bounded by design, §3.4).
+- Per-thread attribution comes free from the thread-local sink (`thread` id on
+  each record); "per-thread time" in the acceptance = the per-thread record
+  partition, summed in the report tool (worker-thread spans inside rayon
+  passes appear under the worker's buffer with the same span name).
+
+**Increment-1 span list (the seams, all cited):**
+
+| Span name | Site |
+|---|---|
+| `turn.processor` + per-phase children (`turn.trade`, `turn.per_player`, `turn.movement_fauna`, `turn.combat_pvp`, `turn.siege`, `turn.victory`, `turn.derived_stats`) | `TurnProcessor::step` phase blocks, `processor.rs:400-649` |
+| `worldsim.terraform_1b` | `WorldSim::apply_pending_terraform`, `lib.rs:266` |
+| `worldsim.climate` | `lib.rs:212` |
+| `worldsim.ecology` + children (`ecology.emergence`, `ecology.lv_tick`, `ecology.dispersal`, `ecology.succession`, `ecology.fish`, `ecology.seed_dispersal`) | `lib.rs:218`; sub-passes in `mc-ecology/src/engine.rs:299-392` |
+| `worldsim.events` | `dispatch_world_events` call, `lib.rs:239` |
+| `worldsim.contamination_4b` | `tick_contamination`, `lib.rs:297` |
+| `hydrology.resolve_local` | p2-78 site **when it lands** (in-flight; per cascade-design §4) |
+| `ai.strategic`, `ai.tactical_build`, `ai.mcts_dispatch`, `ai.learned_inference` | Rust side of the bridge (`api-gdext/src/ai.rs`, `mc-ai` entry points) — moves the authoritative measurement from GDScript usec timers into spans; the GDScript timers remain the feature-off fallback (§5.2) |
+| `pathfinding.astar` | `mc-pathfinding` entry points |
+| `save.serialize`, `save.deserialize` | `mc-save` round-trip entry points + `GdGameState` serialize_full bridge |
+| `speculation.snapshot`, `speculation.compute`, `speculation.join_wait`, `speculation.commit` | p2-83 Increment 2 sites (that design's test plan names these) |
+
+### 3.2 RAM
+
+Recommended default: **counting `#[global_allocator]` wrapper, only in
+profiling builds**, plus a cheap per-turn RSS sample as the cross-check axis.
+
+- `mc-profiling::alloc::CountingAlloc<System>` increments/decrements a
+  thread-local byte counter; `Span::enter` marks the counter,
+  `Drop` records `alloc_delta_bytes` (net) and `alloc_peak_bytes`
+  (high-watermark since mark) for the innermost active span (a thread-local
+  span stack — nesting attributes to the nearest enclosing span, parents see
+  children's allocation in their own delta, which is the correct "cost of
+  this system inclusive" semantics for ranking).
+- Installed **only** by the artifacts that opt in: `api-gdext` and the bench
+  binaries guard it with `#[cfg(feature = "profiling")]
+  #[global_allocator]`. Release artifacts contain no allocator override
+  whatsoever — this is stronger than "zero overhead": the code path does not
+  exist.
+- Per-turn RSS (`/proc/self/statm` on Linux apricot; `mach_task_basic_info`
+  fallback unneeded — profiling runs on apricot) is sampled once per turn at
+  drain time → `process_rss_bytes` field per line. Catches what the
+  counting allocator can't see (Godot-side, GPU driver, mmap).
+- **Not chosen:** jemalloc stats (new heavyweight dep), heap snapshots
+  (offline tooling like `heaptrack` remains available ad hoc and is out of
+  scope).
+
+### 3.3 GPU
+
+Two paths to cover (`FORCE_GPU_ECOLOGY` ecology in `mc-compute`; GPU MCTS in
+`mc-ai/src/gpu/`):
+
+- **Default (Increment 3): wgpu timestamp queries** —
+  `wgpu::Features::TIMESTAMP_QUERY` requested at device creation *only when
+  both* `gpu` and `profiling` features are on; write timestamps around each
+  compute pass, resolve to a buffer, read back with the existing result
+  readback (no extra sync point). Span name `gpu.ecology_tick` /
+  `gpu.mcts_rollout`, value = GPU-side ns.
+- **Fallback (always, Increment 1): wall-clock around dispatch+readback** —
+  the `gpu_walltime.rs` pattern, recorded as an ordinary CPU span
+  (`compute.gpu_dispatch`). If the adapter doesn't expose timestamp queries
+  (the acceptance risk on whatever apricot's GPU reports), the fallback IS
+  the GPU axis and the report tool labels it `gpu_wall` honestly.
+- "Utilization/saturation" is derived in the report tool: GPU-time per turn ÷
+  turn wall-time, trended over game-time — no in-engine sampling of driver
+  counters (premature; §7 do-not-build).
+
+### 3.4 Granularity budget
+
+Hard design rule: **≤ ~64 distinct span names, no per-tile / per-unit spans.**
+Per-entity attribution comes from tags' aggregate dimensions (e.g.
+`ecology.lv_tick` tagged with `populated_tiles = n`), so cost-growth curves
+("grows with populated-tile count" — the objective's own example) are
+recoverable by regression in the report tool without per-tile records.
+
+---
+
+## 4. Output format and the telemetry join
+
+### 4.1 `profiling.jsonl` — a sibling stream in the same pipeline
+
+One line per turn, written to the **same game directory** as
+`turn_stats.jsonl` / `events.jsonl` (the established layout,
+`auto_play.gd:10-11`), keyed by the same `turn` field — that is the join key
+the analysis tool uses across all three streams:
+
+```jsonc
+{"turn": 120, "phase_seq": ["player:0","player:2","fauna","worldsim","round_end"],
+ "process_rss_bytes": 412304928,
+ "spans": [
+   {"name":"worldsim.ecology","calls":1,"total_ms":48.2,"max_ms":48.2,
+    "alloc_delta":1048576,"alloc_peak":9437184,
+    "tags":{"trigger":"round:fauna","populated_tiles":611}},
+   {"name":"ai.mcts_dispatch","calls":4,"total_ms":122.0,"max_ms":40.1,
+    "tags":{"trigger":"ai:turn","slot":3}}
+ ],
+ "godot": {"time_process_ms":3.1,"memory_static":88211456,"draw_calls":412}}
+```
+
+- Deterministic ordering: spans sorted by `(name, tags)`; serde struct field
+  order fixed; floats formatted via the default serde path (diff-clean, the
+  objective's "BTreeMap-ordered" requirement).
+- Append-only file; in-memory state is one turn deep (drained every turn) —
+  the "ring buffer" question from the brief resolves to: **append-only JSONL
+  on disk, ring only for the optional raw-record/chrome-trace mode** (§2.3).
+
+### 4.2 Relationship to `turn_stats.jsonl` (the "don't fork" acceptance)
+
+- `turn_stats.jsonl` keeps its existing schema untouched — a dozen tools
+  parse it (`autoplay-report.py`, batch graders, e2e checks). Bloating it
+  with span arrays would break the cheap-line assumption those tools make.
+- The two existing AI p99 fields (`tactical_state_build_ms_p99`,
+  `mcts_dispatch_ms_p99`, `auto_play.gd:2717-2718`) are **re-sourced, not
+  duplicated**: when profiling is on, `AiTurnBridge.get_perf_p99()` reads the
+  Rust span sink (same numbers, one measurement site); when off, the existing
+  GDScript usec timers keep feeding them exactly as today. One telemetry
+  pipeline, two streams, zero forked measurement paths — this reading of
+  "extending the existing turn_stats/events telemetry" is **recommended
+  default Q1** (§8).
+
+### 4.3 Surfacing — offline tool first
+
+`tools/profiling-report.py` (the `autoplay-report.py` family):
+
+- Ingests `profiling.jsonl` (+ joins `turn_stats.jsonl` on `turn`).
+- Emits the ranked-optimization-target artifact the acceptance names:
+  per-feature cost share (total and by game-time window), growth curves
+  (per-span ms vs turn, with linear/superlinear flagging and the tag-dimension
+  regression of §3.4), GPU time + saturation trend, RAM high-watermarks and
+  per-system net-alloc leaders, top-K table formatted for dropping into an
+  objective file.
+- `--chrome-trace out.json` converts raw-mode records to the Chrome
+  `trace_event` format (loadable in Perfetto/chrome://tracing) — the
+  "flamegraph-compatible" answer without any in-engine flamegraph machinery.
+- A **dev overlay scene is deferred** (§7): the consumers of this data are
+  the operator and team-leads prioritizing optimization work on apricot
+  batches, not a live HUD. (The objective's non-goals already exclude a
+  player-facing perf HUD.)
+
+### 4.4 GDScript side
+
+- Dev gate: `EnvConfig` flag `MC_PROFILING` (the `AI_ARENA` pattern). A small
+  autoload `profiling_recorder.gd`:
+  - guards on `ClassDB.class_exists("GdProfiling")` (the
+    `worldsim_state.gd:39` pattern) — in a release cdylib built without the
+    feature, the class is **not registered** (its `#[derive(GodotClass)]`
+    block is `#[cfg(feature = "profiling")]` in `api-gdext`), so the autoload
+    self-disables with zero residual work (`set_process(false)` after one
+    check);
+  - on `EventBus.turn_ended`, calls `GdProfiling.drain_turn_jsonl(turn)`,
+    appends the Godot block (Performance monitors: `TIME_PROCESS`,
+    `MEMORY_STATIC`, `OBJECT_NODE_COUNT`, `RENDER_TOTAL_DRAW_CALLS_IN_FRAME`)
+    into the same line, writes to `profiling.jsonl`.
+- The renderer "hot path" axis in the acceptance is covered by these
+  Performance monitors + draw-call counts joined per turn — **not** by
+  GDScript-side span macros (GDScript can't be compiled out; a monitor read
+  once per turn behind the env gate is the inert-when-off design the
+  objective requires).
+
+---
+
+## 5. Build increments + test plans
+
+### Increment 1 — facade + CPU spans + emit + report tool + the zero-cost proof
+
+1. `mc-profiling` crate (span macro, ZST off-mode, thread-local sink,
+   `drain_turn`).
+2. Feature plumbing (`profiling = ["mc-profiling/enabled"]`) through
+   `mc-worldsim`, `mc-turn`, `mc-ai`, `mc-pathfinding`, `mc-save`,
+   `mc-player-api`, `api-gdext`.
+3. The §3.1 span list (minus GPU timestamps; `resolve_local` span added when
+   p2-78 lands).
+4. `GdProfiling` bridge class (feature-gated registration) +
+   `profiling_recorder.gd` autoload + `profiling.jsonl` writer.
+5. `tools/profiling-report.py` (cost share + growth curves).
+6. **Zero-cost verification suite** (this is a deliverable, not an
+   afterthought): symbol/strings check script on the release cdylib; codegen
+   probe test; `tools/run-benches.sh` parity run feature-off vs pre-p2-84
+   baseline.
+
+**Gates:** `cargo test -p mc-profiling` (sink determinism: two identical
+synthetic span sequences → byte-identical drain output; nesting/alloc-stack
+unit tests with a mock); `cargo test --workspace` in **both** feature modes
+(acceptance bullet 7); p2-80 worldsim golden vectors green **with profiling
+on** (instrumentation must not perturb the sim — no RNG, no iteration-order
+changes; the spans only observe); headless GUT green both modes; the
+zero-cost suite green; one apricot autoplay run producing a `profiling.jsonl`
+ a `profiling-report.py` ranked table read in-conversation
+(phase-gate-protocol analogue for a non-visual deliverable — the artifact
+review replaces the screenshot; confirm, §8-Q5).
+
+### Increment 2 — RAM axis + trigger-tag completion + GDScript join
+
+1. Counting `#[global_allocator]` (profiling artifacts only) + span
+   alloc-delta/peak.
+2. Per-turn RSS sample.
+3. Trigger tags wired at every event-driven seam (world-event kinds,
+   terraform 1b, player-action classes via the `mc-player-api` dispatch
+   match arms — one tag per `PlayerAction` variant family).
+4. p2-83 `RoundPhase` tag (lands whenever p2-83 Increment 1 is in; degrades
+   to coarse tags before then).
+
+**Gates:** allocator unit tests (delta/peak under nested spans, multi-thread
+attribution); bench parity re-run (the allocator is the riskiest overhead —
+measure profiling-ON cost too and document it, target <5% turn-time so dev
+batches stay representative); report tool shows RAM high-watermark ranking on
+a real apricot batch.
+
+### Increment 3 — GPU timestamps
+
+1. `TIMESTAMP_QUERY` plumbing in `mc-compute` (ecology) and `mc-ai/src/gpu`
+   (MCTS), feature-gated `gpu`+`profiling`, with the wall-clock fallback
+   when the adapter lacks the feature.
+2. Saturation trend in the report tool.
+
+**Gates:** `FORCE_GPU_ECOLOGY=1` apricot run shows `gpu.ecology_tick` spans;
+graceful-fallback test on a device without timestamp support (mock/CI path);
+`mc-ai/tests/gpu_walltime.rs` unaffected.
+
+---
+
+## 6. The release-build guarantee — verification matrix
+
+| Check | Mechanism | Where it runs |
+|---|---|---|
+| No profiling symbols/strings in shipped cdylib | `nm`/`strings` grep for `mc_profiling` + a canary span name, scripted (`tools/` or `./run verify` extension) | apricot release build, CI |
+| Instrumentation erases to nothing | ZST + empty `const fn` + no `Drop` (§2.2) + codegen probe test | `cargo test -p mc-profiling` |
+| No allocator override in release | `#[cfg(feature)]` on the `#[global_allocator]` item itself | code review + symbol check (`CountingAlloc` absent) |
+| No GDScript residual | autoload gates on `ClassDB.class_exists` + env flag; class unregistered in release cdylib | GUT headless test in feature-off build |
+| No measurable overhead | bench parity: feature-off vs pre-p2-84 baseline commit, `tools/run-benches.sh` + `tools/measure-turn-latency.py`, within run-to-run noise | apricot |
+| Sim unperturbed when ON | p2-80 golden vectors + determinism-audit with `--features profiling` | apricot, both modes in CI |
+
+The profiling build is an explicit dev/bench configuration on **apricot**
+(`scripts/apricot-run.sh` / `tools/autoplay-batch.sh` grow a
+`--features profiling` mode); the exported player artifact never enables the
+feature (export scripts under `tools/export*.sh` assert the feature set —
+one-line guard).
+
+---
+
+## 7. Do-not-build list (premature)
+
+- **A `tracing`-crate integration / subscriber ecosystem.** The workspace has
+  no tracing today; the facade is ~300 lines and owns its exact zero-cost
+  story. Adopting `tracing` + `tracing-subscriber` brings always-compiled
+  dispatch machinery and feature-unification hazards for exactly the
+  guarantee we must not risk. Revisit only if span needs outgrow the facade.
+- **In-game dev overlay scene.** Offline report first; the overlay duplicates
+  it for marginal value and drags renderer work into a profiling objective.
+- **Per-tile / per-unit spans.** Tag-dimension aggregates + regression in the
+  report tool (§3.4) answer the growth-curve questions without record
+  explosion.
+- **Driver-level GPU utilization sampling** (NVML etc.) — derived saturation
+  from timestamp totals suffices for ranking.
+- **Heap-profiler integration (heaptrack/dhat) in-engine** — remains an
+  ad-hoc offline tool when a specific leak hunt needs it.
+- **Re-schema-ing `turn_stats.jsonl`** — frozen for existing consumers;
+  profiling is a sibling stream (§4.2).
+- **Continuous profiling in normal dev play sessions** — this is a
+  batch/bench instrument; always-on dev profiling invites "dev build feels
+  slow" noise and Heisenberg effects.
+
+---
+
+## 8. Open questions — operator / architecture calls
+
+- **Q1 — Stream shape.** §4.2's reading of the acceptance ("extend the
+  existing telemetry" = same pipeline/dir/join-key, sibling
+  `profiling.jsonl`, existing turn_stats fields re-sourced not duplicated) vs
+  literally embedding spans into `turn_stats.jsonl`. Recommended: sibling
+  stream — confirm, since the acceptance sentence is ambiguous and a dozen
+  tools parse turn_stats.
+- **Q2 — Allocator scope.** Counting global allocator in profiling builds
+  attributes *all* Rust allocation, including allocator churn from Godot
+  binding glue. Acceptable noise (recommended — it ranks systems, absolute
+  bytes are secondary), or restrict RAM axis to RSS sampling only in
+  Increment 1 and add the allocator later?
+- **Q3 — Profiling-ON overhead budget.** Recommended target <5% turn-time on
+  huge-map apricot batches so profiled runs remain representative for the
+  p2-83 wall-clock work. Confirm the number — it gates how aggressive span
+  placement can get.
+- **Q4 — Trigger taxonomy ownership.** The `trigger` tag vocabulary
+  (`player_action:*`, `world_event:*`, `ai:*`, `terraform:*`, `round:*`,
+  `save:*`) becomes a stable contract the report tool keys on. Recommended:
+  document it as a table in this design's implementation PR +
+  `tools/profiling-report.py` docstring; no JSON data-pack entry (it's
+  engineering vocabulary, not game content — Rail 2 doesn't apply). Confirm.
+- **Q5 — Proof artifact for the phase gate.** This objective has no visual
+  surface; recommended gate artifact = a real apricot batch's
+  `profiling-report.py` ranked table + the zero-cost verification suite
+  output, reviewed in-conversation in place of a proof screenshot. Confirm
+  the protocol adaptation.
+- **Q6 — Windows/macOS RSS path.** Profiling runs on apricot (Linux), so
+  `/proc/self/statm` suffices; recommended to compile the RSS sampler as
+  Linux-only (`#[cfg(target_os = "linux")]`, absent elsewhere) rather than
+  porting it. Confirm no macOS profiling-batch requirement exists (plum must
+  not run heavy batches anyway per the apricot-compute rule).
+
+---
+
+## 9. Key decisions (summary for the operator)
+
+1. **One facade crate, ZST-erasure zero-cost.** `mc-profiling` with an
+   `enabled` feature; consumer crates expose `profiling = 
+   ["mc-profiling/enabled"]`; off-mode `Span` is a ZST with a `const fn`
+   constructor and no `Drop` — compiles to literally nothing, **verified** by
+   symbol checks + codegen probe + bench parity, not asserted. No `tracing`
+   ecosystem.
+2. **Spans land on seams that already exist** — `WorldSim::step` sub-steps
+   (incl. the cascade's 1b/4b and p2-78's `resolve_local` when it lands),
+   `TurnProcessor::step` phases, AI bridge entry points, pathfinding, save —
+   ≤64 names, no per-entity spans; growth curves come from tag-dimension
+   regression.
+3. **Three axes, pragmatic order:** CPU wall + per-thread (Increment 1);
+   RAM via a profiling-build-only counting global allocator + per-turn RSS
+   (Increment 2); GPU via wgpu timestamp queries with wall-clock fallback
+   (Increment 3).
+4. **Output rides the existing telemetry pipeline:** sibling
+   `profiling.jsonl` per game dir, joined to `turn_stats.jsonl` on `turn`;
+   the two existing AI p99 fields are re-sourced from spans when profiling is
+   on (no forked measurement); `tools/profiling-report.py` produces the
+   ranked-target artifact that feeds p2-83 and the huge-map budgets;
+   chrome-trace export covers the flamegraph want.
+5. **Dev-only is structural, not configurational:** release artifacts contain
+   no spans, no allocator override, no registered `GdProfiling` class, no
+   autoload work beyond one env check — and the export scripts assert it.
+
+---
+
+*Design authored against: `mc-worldsim/src/lib.rs:190-311`,
+`mc-turn/src/processor.rs:392-652`, `mc-ecology/src/engine.rs:275-392,1011-1047`,
+`mc-compute/Cargo.toml`, `mc-ai` (gpu module, benches, `gpu_walltime.rs`),
+`mc-core/src/perf.rs`, `mc-player-api/src/dispatch.rs`,
+`src/game/engine/src/modules/ai/ai_turn_bridge.gd:60-110,595-680`,
+`src/game/engine/scenes/tests/auto_play.gd:10-11,2653-2731`,
+`src/game/engine/src/autoloads/{worldsim_state.gd,env_config.gd,event_bus.gd}`,
+the `tools/` analysis family, and objectives p2-83/p2-84 + the
+`.project/designs/p2-76-79-terraforming-cascade-design.md` §7.4 coupling
+notes. mc-worldsim / mc-state / mc-mapgen / api-gdext are under concurrent
+modification (p2-76…79) — re-verify line citations at build time.*
--- a/.project/objectives/p2-83.md
+++ b/.project/objectives/p2-83.md
@ -46,3 +46,11 @@ Make game lifecycle + per-round progression **first-class, observable, save-awar
 - ❌ Determinism parity: with a `SPECULATIVE_TURN` feature flag, output is BYTE-IDENTICAL speculation-on vs speculation-off — the p2-80 golden-vector + continued-trajectory tests pass in both modes (no nondeterministic iteration; WorldsimDynamics stream unperturbed).
 - ❌ Wall-clock win measured on apricot: huge-map per-turn perceived latency (End-Turn → next playable) drops vs the serial path, quantified, with no determinism or save regression.
 - ❌ cargo + headless GUT green (incl. the determinism gate in both flag modes); proof that a mid-round save/load + resume is byte-identical.
+
+---
+
+**Design ready (2026-06-10).** Build-ready engineering design at
+`.project/designs/p2-83-phase-round-state-machine-design.md` — enum/persistence
+model, `RoundDriver` sequencer, speculation predicate + invalidation rule,
+threading model, two increments with test plans, do-not-build list. Status
+stays `missing` until code lands (design ≠ implementation).
--- a/.project/objectives/p2-84.md
+++ b/.project/objectives/p2-84.md
@ -45,3 +45,12 @@ To prioritize optimization points (e.g. which worldsim subsystem to parallelize
 - ❌ Aggregation/report tool under tools/ produces ranked optimization targets: per-feature cost share, growth curve over game-time, GPU saturation, RAM high-watermarks — in a form that directly informs p2-83 + huge-map budgets.
 - ❌ ZERO-COST IN RELEASE: instrumentation behind a Cargo `profiling` feature (and GDScript dev gate); release build compiles it out entirely — verified no measurable overhead vs un-instrumented baseline and no profiling symbols in the shipped artifact.
 - ❌ Reuses/generalizes the existing per-phase wall-clock fields (tactical_state_build_ms_p99 / mcts_dispatch_ms_p99) rather than forking a parallel telemetry path; cargo + GUT green in both feature modes.
+
+---
+
+**Design ready (2026-06-10).** Build-ready engineering design at
+`.project/designs/p2-84-dev-compute-profiling-design.md` — `mc-profiling`
+facade crate with ZST-erasure zero-cost mechanism, span/seam inventory, three
+resource axes, `profiling.jsonl` + report tool, release-guarantee verification
+matrix, three increments with test plans. Status stays `missing` until code
+lands (design ≠ implementation).
--- a/src/game/engine/scenes/tests/bunker_proof.gd
+++ b/src/game/engine/scenes/tests/bunker_proof.gd
@ -6,8 +6,8 @@ extends Node2D
 ## completes a bunker there via `complete_improvement`, and visualises the
 ## before/after: the bunker applies `defense_bonus: 100` + `concealed_from_surface`
 ## (p2-75 path), permanently DESTROYS the deposit (`is_deposit_destroyed`), and the
-## scorched surface is queued unworkable. Also demonstrates the temporary river-gap
-## build guard (`bunker_river_gap_blocked`).
+## scorched surface is queued unworkable. River damming is proven separately by
+## hydrology_dam_proof.tscn (p2-78 — the former river-gap build guard is removed).
 ##
 ## Self-capturing (models improvement_proof.gd): renders one panel, screenshots,
 ## quits. Headless via weston (scripts/ui-proof-capture.sh).
@ -27,8 +27,6 @@ const GRID_H: int = 10
 const BUNKER_COL: int = 6
 const BUNKER_ROW: int = 5
 const DEPOSIT_TIER: int = 7   # tile quality → contamination duration 70 turns
-const RIVER_COL: int = 3
-const RIVER_ROW: int = 5

 var _state: RefCounted = null
 var _captured: bool = false
@ -39,8 +37,6 @@ var _defense_after: int = 0
 var _concealed_after: bool = false
 var _deposit_destroyed_before: bool = false
 var _deposit_destroyed_after: bool = false
-var _river_gap_blocked: bool = false
-var _dry_tile_blocked: bool = false
 var _pending_contamination_turns: int = 0
 var _extension_present: bool = true

@ -71,27 +67,14 @@ func _run_bunker_cycle() -> void:
 	# adopts the finished grid via set_grid_from_gridstate.
 	var grid: RefCounted = GdGridState.create(GRID_W, GRID_H)

-	# Set the bunker tile to hills with a tier-7 deposit (quality = tier source),
-	# and a separate river-course tile to exercise the build guard.
+	# Set the bunker tile to hills with a tier-7 deposit (quality = tier source).
 	var bunker_tile: Dictionary = grid.call("get_tile_dict", BUNKER_COL, BUNKER_ROW) as Dictionary
 	bunker_tile["biome_id"] = "hills"
 	bunker_tile["quality"] = DEPOSIT_TIER
 	grid.call("set_tile_dict", BUNKER_COL, BUNKER_ROW, bunker_tile)

-	var river_tile: Dictionary = grid.call("get_tile_dict", RIVER_COL, RIVER_ROW) as Dictionary
-	river_tile["biome_id"] = "hills"
-	# Typed Array[int] — the Rust side converts via `to::<Array<i64>>()`,
-	# which rejects an untyped Variant array.
-	var river_edges: Array[int] = [0, 3]
-	river_tile["river_edges"] = river_edges
-	grid.call("set_tile_dict", RIVER_COL, RIVER_ROW, river_tile)
-
 	_state.call("set_grid_from_gridstate", grid)

-	# Build guard: river-course tile blocked, dry hills tile allowed.
-	_river_gap_blocked = bool(_state.call("bunker_river_gap_blocked", RIVER_COL, RIVER_ROW))
-	_dry_tile_blocked = bool(_state.call("bunker_river_gap_blocked", BUNKER_COL, BUNKER_ROW))
-
 	# Before state.
 	_defense_before = int(_state.call("tile_improvement_defense_bonus", BUNKER_COL, BUNKER_ROW))
 	_deposit_destroyed_before = bool(_state.call("is_deposit_destroyed", BUNKER_COL, BUNKER_ROW))
@ -111,9 +94,6 @@ func _run_bunker_cycle() -> void:
 	print("deposit destroyed: %s → %s" % [
 		str(_deposit_destroyed_before), str(_deposit_destroyed_after)
 	])
-	print("river-gap guard: river tile blocked=%s, dry tile blocked=%s" % [
-		str(_river_gap_blocked), str(_dry_tile_blocked)
-	])
 	print("pending contamination: %d turns (tier %d × 10)" % [
 		_pending_contamination_turns, DEPOSIT_TIER
 	])
@ -157,10 +137,10 @@ func _draw() -> void:
 		"%d turns (tier %d × 10)" % [_pending_contamination_turns, DEPOSIT_TIER],
 		_pending_contamination_turns == 70); y += 30

-	# p2-76 river-gap guard.
-	_check(font, x, y, "River-gap build guard blocks a river-course tile",
-		"river=%s / dry=%s" % [str(_river_gap_blocked), str(_dry_tile_blocked)],
-		_river_gap_blocked and (not _dry_tile_blocked)); y += 40
+	# p2-78: the river-gap build guard is gone — damming resolves for real.
+	_line(font, x, y,
+		"River damming: resolved by the p2-78 re-solve (see hydrology_dam_proof)",
+		Color(0.6, 0.65, 0.75)); y += 40

 	draw_string(font, Vector2(x, y),
 		"All effects resolved in Rust (Rail 1): GdGameState.complete_improvement →",
--- a/src/game/engine/src/modules/management/improvement_manager.gd
+++ b/src/game/engine/src/modules/management/improvement_manager.gd
@ -40,9 +40,6 @@ func get_buildable_improvements(
 		if tech_req != "" and tech_req != "null" and not player.has_tech(tech_req):
 			continue

-		if _river_gap_blocked(data, unit.position):
-			continue
-
 		result.append({
 			"id": imp_id,
 			"name": data.get("name", imp_id),
@ -52,20 +49,6 @@ func get_buildable_improvements(
 	return result


-func _river_gap_blocked(data: Dictionary, tile_pos: Vector2i) -> bool:
-	## Deposit-destroying improvements (bunker) cannot be sited on a
-	## river-course tile until p2-78 lands the windowed hydrology re-solve.
-	## The verdict comes from Rust (`GdGameState.bunker_river_gap_blocked`);
-	## this is only the build-validity consultation.
-	var effects: Dictionary = data.get("effects", {}) as Dictionary
-	if not bool(effects.get("destroys_deposit", false)):
-		return false
-	var gd_state: RefCounted = GameState.get_gd_state()
-	if gd_state == null:
-		return false
-	return bool(gd_state.call("bunker_river_gap_blocked", tile_pos.x, tile_pos.y))
-
-
 func _can_unit_build_at(
 	unit: RefCounted, game_map: RefCounted, player: RefCounted
 ) -> bool:
--- a/src/simulator/api-gdext/src/lib.rs
+++ b/src/simulator/api-gdext/src/lib.rs
@ -344,6 +344,16 @@ impl GdGridState {
            None => Dictionary::new(),
        }
    }
+
+    /// p2-78 — run the worldgen hydrology baker (D6 flow, drainage, lake fill,
+    /// Strahler order, riparian BFS) over this grid, populating the five
+    /// hydrology fields on every tile. Lets proof scenes / tests bake a real
+    /// hydrology field on an authored elevation grid before exercising the
+    /// runtime localized re-solve.
+    #[func]
+    fn run_hydrology(&mut self, map_seed: i64) {
+        mc_mapgen::run_hydrology(map_seed as u64, &mut self.inner);
+    }
 }

 // ── GdFloraSelector ─────────────────────────────────────────────────────
@ -1294,9 +1304,9 @@ fn dict_to_tile(dict: &Dictionary, tile: &mut mc_core::grid::TileState) {
    if let Some(v) = dict.get("surface_water") { tile.surface_water = v.to::<f64>() as f32; }
    if let Some(v) = dict.get("river_source_type") { tile.river_source_type = v.to::<GString>().to_string(); }
    if let Some(v) = dict.get("is_coastal") { tile.is_coastal = v.to::<bool>(); }
-    // p2-76: river-course edges, so the bunker river-gap build guard
-    // (`bunker_river_gap_blocked` reads `tile.river_edges`) is settable from
-    // GDScript proof/test scenarios. Round-trips with `tile_to_dict`'s emit.
+    // p2-76/p2-78: river-course edges, settable from GDScript proof/test
+    // scenarios (e.g. authoring a river course for the dam re-solve proof).
+    // Round-trips with `tile_to_dict`'s emit.
    if let Some(v) = dict.get("river_edges") {
        tile.river_edges = v.to::<Array<i64>>().iter_shared().map(|e| e as i32).collect();
    }
@ -5503,16 +5513,86 @@ impl GdGameState {
        }
    }

-    /// p2-76 **temporary** river-gap build guard: true when a bunker (or other
-    /// deposit-destroying improvement) must be FORBIDDEN at `(col, row)` because
-    /// the tile carries a river course (damming it needs the `p2-78` hydrology
-    /// re-solve). The build-validity path consults this before allowing a bunker.
-    /// Removed by `p2-78`.
+    /// p2-78 — resolve a river dam at `(col, row)`: when the tile carries a
+    /// river course, re-run the localized D6 flow + Planchon-Darboux solve
+    /// around it (the new obstruction raised; standing deposit-destroying
+    /// improvements kept as baseline obstructions) and apply the delta to the
+    /// grid — upstream lake cells, downstream `river_edges` removal,
+    /// `riparian_distance` rise. Thin bridge over
+    /// `mc_worldsim::resolve_river_dam`, the same function `WorldSim::step`
+    /// sub-step 1b runs; used by proof scenes and the playable completion path.
+    ///
+    /// Returns a Dictionary:
+    /// - `dammed` (bool) — false when there is no grid or no river course;
+    /// - `changed_tiles` (int), `removed_river_edges` (int),
+    ///   `added_river_edges` (int);
+    /// - `added_lake_cells` (Array of Vector2i).
    #[func]
-    pub fn bunker_river_gap_blocked(&self, col: i64, row: i64) -> bool {
-        let Ok(c) = u16::try_from(col) else { return false };
-        let Ok(r) = u16::try_from(row) else { return false };
-        self.inner.bunker_river_gap_blocked(c, r)
+    pub fn resolve_river_dam(&mut self, col: i64, row: i64) -> Dictionary {
+        let mut d = Dictionary::new();
+        d.set("dammed", false);
+        let Ok(c) = u16::try_from(col) else { return d };
+        let Ok(r) = u16::try_from(row) else { return d };
+        let params = mc_mapgen::HydrologyResolveParams::default();
+        let Some(delta) = mc_worldsim::resolve_river_dam(&mut self.inner, c, r, &params, &[])
+        else {
+            return d;
+        };
+        d.set("dammed", true);
+        d.set("changed_tiles", delta.changed_tiles.len() as i64);
+        d.set(
+            "removed_river_edges",
+            delta.river_edge_changes.iter().filter(|e| !e.added).count() as i64,
+        );
+        d.set(
+            "added_river_edges",
+            delta.river_edge_changes.iter().filter(|e| e.added).count() as i64,
+        );
+        let lakes: Array<Vector2i> = delta
+            .added_lake_cells
+            .iter()
+            .map(|&(lc, lr, _)| Vector2i::new(lc, lr))
+            .collect();
+        d.set("added_lake_cells", lakes);
+        d
+    }
+
+    /// p2-78 — hydrology fields of the attached grid's tile at `(col, row)`,
+    /// mirroring `GdGridState::tile_hydrology` (proof scenes read before/after
+    /// dam state off the REAL game state). Empty Dictionary when there is no
+    /// grid or the tile is off-map.
+    #[func]
+    pub fn tile_hydrology(&self, col: i64, row: i64) -> Dictionary {
+        let mut d = Dictionary::new();
+        let Some(tile) = self
+            .inner
+            .grid
+            .as_ref()
+            .and_then(|g| g.tile(col as i32, row as i32))
+        else {
+            return d;
+        };
+        d.set("flow_out", tile.flow_out as i64);
+        d.set("drainage_area", tile.drainage_area as i64);
+        d.set("stream_order", tile.stream_order as i64);
+        d.set("lake_id", tile.lake_id.map(|v| v as i64).unwrap_or(-1));
+        d.set("riparian_distance", tile.riparian_distance as i64);
+        d
+    }
+
+    /// p2-78 — the `river_edges` direction list of the attached grid's tile at
+    /// `(col, row)`. Empty Array when there is no grid or the tile is off-map.
+    #[func]
+    pub fn tile_river_edges(&self, col: i64, row: i64) -> Array<i64> {
+        match self
+            .inner
+            .grid
+            .as_ref()
+            .and_then(|g| g.tile(col as i32, row as i32))
+        {
+            Some(tile) => tile.river_edges.iter().map(|&d| i64::from(d)).collect(),
+            None => Array::new(),
+        }
    }

    /// Queue a bombard request for the turn processor to drain.
--- a/src/simulator/crates/mc-save/Cargo.toml
+++ b/src/simulator/crates/mc-save/Cargo.toml
@ -15,6 +15,9 @@ thiserror    = "1"
 [dev-dependencies]
 mc-vision    = { path = "../mc-vision" }
 mc-state     = { path = "../mc-state" }
+# p2-78 - round-trip tests bake hydrology and apply a runtime dam re-solve
+# before saving, so the mutated river/lake fields are covered post-worldgen.
+mc-mapgen    = { path = "../mc-mapgen" }

 [lints]
 workspace = true
--- a/src/simulator/crates/mc-save/tests/round_trip.rs
+++ b/src/simulator/crates/mc-save/tests/round_trip.rs
@ -144,6 +144,79 @@ fn worldsim_state_round_trips_byte_equal() {
    );
 }

+#[test]
+fn hydrology_resolve_mutations_round_trip_byte_equal() {
+    // p2-78: river/lake hydrology fields were worldgen-only before the runtime
+    // localized re-solve; after a dam they are LIVE persisted state. Bake a
+    // dammed-valley grid, apply the re-solve, save, load, and assert the
+    // mutated fields (removed river_edges, new lake cells, raised
+    // riparian_distance, edge_features) restore byte-identical. All five
+    // hydrology fields are #[serde(default)] on TileState, so the save format
+    // stays migration-safe.
+    use mc_mapgen::{apply_hydrology_delta, resolve_local, Obstruction};
+
+    // Dammed-valley fixture (same shape as the mc-mapgen/mc-worldsim tests):
+    // sloped walls, eastward channel along row 8, northward spillway at col 12.
+    let mut grid = GridState::new(24, 16);
+    for t in &mut grid.tiles {
+        t.elevation = 0.9 - 0.002 * t.col as f32 - 0.001 * t.row as f32;
+    }
+    for col in 10..24 {
+        let i = grid.idx(col, 8);
+        grid.tiles[i].elevation = 0.50 - 0.02 * (col - 10) as f32;
+    }
+    for row in 0..=7 {
+        let i = grid.idx(12, row);
+        grid.tiles[i].elevation = 0.55 - 0.01 * (7 - row) as f32;
+    }
+    mc_mapgen::run_hydrology(0, &mut grid);
+    for col in 10..24 {
+        let i = grid.idx(col, 8);
+        grid.tiles[i].river_edges = vec![0, 3];
+    }
+    grid.migrate_river_edges_to_edge_features();
+
+    // Apply the runtime dam re-solve — the post-worldgen mutation under test.
+    let dam = Obstruction { col: 14, row: 8, raise_to: 2.0 };
+    let delta = resolve_local(&grid, (14, 8), 6, &[], Some(&dam));
+    assert!(
+        !delta.added_lake_cells.is_empty(),
+        "fixture must actually flood (vacuous otherwise)"
+    );
+    apply_hydrology_delta(&mut grid, &delta);
+
+    let mut sf = make_save(1, 1);
+    sf.grid = grid;
+    let bytes = save(&sf).expect("save after re-solve");
+    let loaded = load(&bytes).expect("load after re-solve");
+
+    let original_json = serde_json::to_string(&sf.grid).expect("ser original");
+    let loaded_json = serde_json::to_string(&loaded.grid).expect("ser loaded");
+    assert_eq!(
+        original_json, loaded_json,
+        "post-re-solve grid (mutated river_edges + lakes + riparian + \
+         edge_features) must survive save->load byte-equal"
+    );
+    // Spot-check the dam-attributable mutations specifically.
+    let (lc, lr, _) = delta.added_lake_cells[0];
+    let lake_tile = loaded.grid.tile(lc, lr).expect("lake tile");
+    assert!(lake_tile.lake_id.is_some(), "flooded lake cell survives the round-trip");
+    let parched = delta
+        .river_edge_changes
+        .iter()
+        .find(|e| !e.added && e.row == 8 && e.col > 14)
+        .expect("a downstream river-edge removal");
+    let parched_tile = loaded.grid.tile(parched.col, parched.row).expect("parched tile");
+    assert!(
+        parched_tile.river_edges.is_empty(),
+        "downstream river_edges removal survives the round-trip"
+    );
+    assert!(
+        parched_tile.riparian_distance > 0,
+        "raised riparian_distance survives the round-trip"
+    );
+}
+
 #[test]
 fn worldsim_state_missing_in_old_save_reads_as_none() {
    // Saves predating the field must still load — `#[serde(default)]` yields
--- a/src/simulator/crates/mc-worldsim/src/lib.rs
+++ b/src/simulator/crates/mc-worldsim/src/lib.rs
@ -56,12 +56,65 @@ use mc_ecology::biological::BiologicalThresholds;
 use mc_ecology::tile::{TileContamination, TileEcoState};
 use mc_ecology::EcologyEngine;
 use mc_mapgen::events::GeologicalThresholds;
+use mc_mapgen::{apply_hydrology_delta, resolve_local, HydrologyDelta, HydrologyResolveParams, Obstruction};
 use mc_state::game_state::GameState;
 use mc_turn::chronicle::{Chronicle, ChronicleEntry};
 use mc_turn::{TurnProcessor, TurnResult};

 pub use event_dispatch::dispatch_world_events;

+/// p2-78 — resolve a river dam at `(col, row)`: when the tile carries a river
+/// course (`river_edges` non-empty), re-run the localized hydrology solve
+/// around it with the new obstruction and apply the resulting
+/// [`HydrologyDelta`] to the grid (upstream lake cells, downstream
+/// `river_edges` removal, `riparian_distance` rise). Returns `None` when there
+/// is no grid or the tile is not on a river course.
+///
+/// Every *standing* deposit-destroying improvement (a completed bunker) except
+/// the ones in `skip_tiles` is supplied as an existing obstruction, so earlier
+/// dams keep shaping the baseline and a re-solve never "un-dams" them.
+/// `skip_tiles` carries the tiles of terraform events not yet applied this
+/// turn — including the triggering one, whose improvement anchor is already
+/// written by `complete_improvement` before sub-step 1b drains the queue.
+///
+/// Shared by `WorldSim::step` sub-step 1b and the `GdGameState` bridge (the
+/// proof-scene / playable entry point). Deterministic: no RNG (see
+/// `mc_mapgen::hydrology_resolve` module docs).
+pub fn resolve_river_dam(
+    state: &mut GameState,
+    col: u16,
+    row: u16,
+    params: &HydrologyResolveParams,
+    skip_tiles: &[(u16, u16)],
+) -> Option<HydrologyDelta> {
+    let (c, r) = (i32::from(col), i32::from(row));
+    let on_river = state
+        .grid
+        .as_ref()
+        .and_then(|g| g.tile(c, r))
+        .is_some_and(|t| !t.river_edges.is_empty());
+    if !on_river {
+        return None;
+    }
+    let existing: Vec<Obstruction> = state
+        .tile_improvements
+        .iter()
+        .filter(|((ic, ir), imp)| {
+            imp.effects.destroys_deposit && !skip_tiles.contains(&(*ic, *ir))
+        })
+        .map(|((ic, ir), _)| Obstruction {
+            col: i32::from(*ic),
+            row: i32::from(*ir),
+            raise_to: params.dam_raise_to,
+        })
+        .collect();
+    let new_obstruction = Obstruction { col: c, row: r, raise_to: params.dam_raise_to };
+    let grid = state.grid.as_mut()?;
+    let delta = resolve_local(grid, (c, r), params.radius, &existing, Some(&new_obstruction));
+    apply_hydrology_delta(grid, &delta);
+    Some(delta)
+}
+
 /// Per-turn simulation timestep handed to the continuous worldsim engines.
 /// One game turn advances the continuous sim by `dt = 1.0`.
 const TURN_DT: f32 = 1.0;
@ -101,6 +154,10 @@ pub struct WorldSim {
    pub contamination_map: BTreeMap<(u16, u16), TileContamination>,
    /// Turn-by-turn world-event history (geological / biological / anomalous).
    pub chronicle: Chronicle,
+    /// p2-78: runtime hydrology re-solve tunables (window radius + dam
+    /// elevation). Defaults mirror `hydrology.json` `local_resolve`; override
+    /// via [`Self::set_hydrology_resolve_params`] after loading the JSON pack.
+    hydro_resolve: HydrologyResolveParams,
 }

 impl WorldSim {
@ -132,9 +189,17 @@ impl WorldSim {
            eco_map: BTreeMap::new(),
            contamination_map: BTreeMap::new(),
            chronicle: Chronicle::new(),
+            hydro_resolve: HydrologyResolveParams::default(),
        }
    }

+    /// p2-78 — override the hydrology re-solve tunables (Rail 2: the caller
+    /// loads `hydrology.json` and passes
+    /// `HydrologyResolveParams::from_spec(&value)`).
+    pub fn set_hydrology_resolve_params(&mut self, params: HydrologyResolveParams) {
+        self.hydro_resolve = params;
+    }
+
    /// Read-only view of the owned ecology engine (population map, registry).
    #[must_use]
    pub fn ecology(&self) -> &EcologyEngine {
@ -259,13 +324,44 @@ impl WorldSim {
        StepResult { turn, world_events }
    }

-    /// p2-76 sub-step 1b — drain `GameState::pending_terraform` and seed the
-    /// contamination overlay for each deposit-destroying completion. Deterministic:
+    /// p2-76/p2-78 sub-step 1b — drain `GameState::pending_terraform`; for each
+    /// deposit-destroying completion, first resolve a river dam if the tile sits
+    /// on a river course (p2-78 localized hydrology re-solve, applied to the
+    /// grid + chronicled), then seed the contamination overlay. Deterministic:
    /// the queue is drained in insertion order; the contamination duration comes
-    /// from the tier SNAPSHOTTED at completion (never re-derived from seed).
+    /// from the tier SNAPSHOTTED at completion (never re-derived from seed); the
+    /// dam re-solve draws no RNG.
    fn apply_pending_terraform(&mut self, state: &mut GameState) {
        let pending = std::mem::take(&mut state.pending_terraform);
-        for ev in pending {
+        if pending.is_empty() {
+            return;
+        }
+        // Tiles of events not yet applied this turn: their improvement anchors
+        // are already written, but they must not pre-dam the baseline of the
+        // events resolved before them (insertion order).
+        let pending_tiles: Vec<(u16, u16)> = pending.iter().map(|e| (e.col, e.row)).collect();
+        for (i, ev) in pending.iter().enumerate() {
+            // p2-78 — river dam: localized flow/basin-fill re-solve.
+            if let Some(delta) = resolve_river_dam(
+                state,
+                ev.col,
+                ev.row,
+                &self.hydro_resolve,
+                &pending_tiles[i..],
+            ) {
+                self.chronicle.push(ChronicleEntry::WorldEvent {
+                    turn: state.turn,
+                    category: "terraform".to_string(),
+                    kind: "river_dammed".to_string(),
+                    col: i32::from(ev.col),
+                    row: i32::from(ev.row),
+                    severity_milli: i32::try_from(delta.added_lake_cells.len())
+                        .unwrap_or(i32::MAX)
+                        .saturating_mul(1000),
+                });
+            }
+
+            // p2-76 — surface contamination.
            let Some(spec) = ev.contamination.as_ref() else {
                continue; // deposit destroyed but no contamination authored
            };
@ -902,4 +998,180 @@ mod tests {
        assert!(sim.contamination_map().is_empty(), "no contamination without terraform");
        assert!(state.unworkable_tiles.is_empty(), "no unworkable tiles without terraform");
    }
+
+    // ── p2-78 — runtime localized hydrology re-solve (river dam) ─────────────
+
+    const VALLEY_W: i32 = 24;
+    const VALLEY_H: i32 = 16;
+    const CHANNEL_ROW: i32 = 8;
+    const DAM_COL: i32 = 14;
+
+    /// The dammed-valley fixture (same shape as the mc-mapgen
+    /// `hydrology_resolve` tests and the `hydrology_dam_proof` scene): sloped
+    /// high walls, an eastward-draining channel along row 8 (cols 10..=23) and
+    /// a northward side spillway at col 12 — the contained spill route once
+    /// the channel is dammed mid-course.
+    fn valley_state() -> GameState {
+        let mut grid = GridState::new(VALLEY_W, VALLEY_H);
+        grid.o2_fraction = 0.21;
+        for t in &mut grid.tiles {
+            t.elevation = 0.9 - 0.002 * t.col as f32 - 0.001 * t.row as f32;
+            t.biome_label_id = "hills".to_string();
+        }
+        for col in 10..VALLEY_W {
+            let i = grid.idx(col, CHANNEL_ROW);
+            grid.tiles[i].elevation = 0.50 - 0.02 * (col - 10) as f32;
+            grid.tiles[i].biome_label_id = "grassland".to_string();
+        }
+        for row in 0..=7 {
+            let i = grid.idx(12, row);
+            grid.tiles[i].elevation = 0.55 - 0.01 * (7 - row) as f32;
+        }
+        mc_mapgen::run_hydrology(0, &mut grid);
+        for col in 10..VALLEY_W {
+            let i = grid.idx(col, CHANNEL_ROW);
+            grid.tiles[i].river_edges = vec![0, 3];
+        }
+        grid.migrate_river_edges_to_edge_features();
+        grid.stamp_terrain_tier_caps();
+
+        let mut state = GameState::default();
+        state.grid = Some(grid);
+        state.turn = 0;
+        state
+    }
+
+    fn dam_event() -> TerraformEvent {
+        TerraformEvent {
+            col: DAM_COL as u16,
+            row: CHANNEL_ROW as u16,
+            destroyed_tier: 3,
+            contamination: Some(bunker_contamination_spec()),
+        }
+    }
+
+    /// Serialize the five hydrology fields + river_edges of every tile —
+    /// the determinism comparison key for the dam tests.
+    fn hydrology_snapshot(state: &GameState) -> String {
+        let grid = state.grid.as_ref().expect("grid");
+        let fields: Vec<_> = grid
+            .tiles
+            .iter()
+            .map(|t| {
+                (
+                    t.flow_out,
+                    t.drainage_area,
+                    t.stream_order,
+                    t.lake_id,
+                    t.riparian_distance,
+                    t.river_edges.clone(),
+                )
+            })
+            .collect();
+        serde_json::to_string(&fields).expect("serialize hydrology fields")
+    }
+
+    /// p2-78 acceptance — a bunker completion on a river-gap tile triggers the
+    /// localized re-solve through `WorldSim::step` 1b: upstream floods into a
+    /// lake, downstream loses `river_edges` and gains `riparian_distance`, and
+    /// the chronicle carries a `river_dammed` entry.
+    #[test]
+    fn terraform_on_river_course_dams_river() {
+        let mut sim = make_worldsim();
+        let mut state = valley_state();
+        state.pending_terraform.push(dam_event());
+        sim.step(&mut state);
+
+        let grid = state.grid.as_ref().expect("grid");
+        // Upstream-of-dam floods: a channel tile west of the dam is now a lake.
+        let upstream = grid.tile(DAM_COL - 2, CHANNEL_ROW).expect("upstream tile");
+        assert!(
+            upstream.lake_id.is_some(),
+            "upstream channel tile must flood into a lake (lake_id set)"
+        );
+        assert_eq!(upstream.riparian_distance, 0, "lake cell is riparian-0");
+        // Downstream parches: the tile past the dam loses its river course and
+        // its riparian distance rises off 0.
+        let downstream = grid.tile(DAM_COL + 2, CHANNEL_ROW).expect("downstream tile");
+        assert!(
+            downstream.river_edges.is_empty(),
+            "downstream tile must lose its river_edges"
+        );
+        assert!(
+            downstream.riparian_distance > 0,
+            "downstream tile must gain riparian_distance"
+        );
+        // Chronicle carries the dam event alongside the contamination one.
+        assert!(
+            sim.chronicle.entries().iter().any(|e| matches!(
+                e,
+                ChronicleEntry::WorldEvent { category, kind, .. }
+                    if category == "terraform" && kind == "river_dammed"
+            )),
+            "river_dammed chronicle entry pushed"
+        );
+        // The p2-76 contamination path still ran for the same event.
+        assert!(
+            sim.contamination_map()
+                .contains_key(&(DAM_COL as u16, CHANNEL_ROW as u16)),
+            "contamination seeded on the dam tile"
+        );
+    }
+
+    /// p2-78 acceptance — determinism: same seed + same terraforming act ⇒
+    /// identical post-resolve hydrology (PCG64 pin untouched — the re-solve
+    /// draws no RNG; this gates the whole-step integration).
+    #[test]
+    fn dam_resolve_is_deterministic_through_worldsim_step() {
+        fn run(dam: bool) -> String {
+            let mut sim = make_worldsim();
+            let mut state = valley_state();
+            if dam {
+                state.pending_terraform.push(dam_event());
+            }
+            for _ in 0..3 {
+                sim.step(&mut state);
+            }
+            hydrology_snapshot(&state)
+        }
+        let a = run(true);
+        let b = run(true);
+        let control = run(false);
+        assert_ne!(a, control, "dam must actually change hydrology — vacuous otherwise");
+        assert_eq!(a, b, "dam re-solve must be deterministic across identical runs");
+    }
+
+    /// A deposit-destroying completion on a DRY tile (no river course) must
+    /// not trigger the re-solve: hydrology identical to a control step with no
+    /// terraform at all, and no river_dammed entry. (Compared against a
+    /// control run, not the pre-step state, so the climate/ecology ticks that
+    /// also ride `WorldSim::step` cancel out.)
+    #[test]
+    fn terraform_off_river_course_does_not_dam() {
+        let mut control_sim = make_worldsim();
+        let mut control_state = valley_state();
+        control_sim.step(&mut control_state);
+
+        let mut sim = make_worldsim();
+        let mut state = valley_state();
+        state.pending_terraform.push(TerraformEvent {
+            col: 4,
+            row: 4, // wall tile, no river_edges
+            destroyed_tier: 3,
+            contamination: Some(bunker_contamination_spec()),
+        });
+        sim.step(&mut state);
+        assert_eq!(
+            hydrology_snapshot(&state),
+            hydrology_snapshot(&control_state),
+            "no hydrology change without a river course"
+        );
+        assert!(
+            !sim.chronicle.entries().iter().any(|e| matches!(
+                e,
+                ChronicleEntry::WorldEvent { kind, .. } if kind == "river_dammed"
+            )),
+            "no river_dammed chronicle entry for a dry-tile completion"
+        );
+    }
 }