diff --git a/AGENTS.md b/AGENTS.md index ed97b91a..1c22eb42 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -127,6 +127,36 @@ Surface options + a recommendation; don't silently pick. Otherwise: act, verify, --- +## 5. Before the next tick: hand off to the independent Opus reviewer + +You do not get to be the only judge of your own work. The integrity gap in §2 is exactly the failure +mode of self-grading. So **whenever you think you are finished** — a batch landed, an objective +closed, or you are about to go idle / end a work session before the next tick — your *last step* is to +hand your work to a **different model** for independent review: + +``` +scripts/grok-review.sh +``` + +This runs **Claude Opus** (not you) against the `grok-review` skill +(`.claude/skills/grok-review/SKILL.md`). Opus re-runs the verification gates you cited +(verify-don't-trust, §2.1), records a dated review log under `.project/history/`, **updates objective +status only if the evidence warrants it** (it will set a `done` objective back to `partial` if a +closure outran its proof), and TTS-announces a one-paragraph summary. + +Rules for the handoff: +- **It is mandatory at the "I'm done" boundary**, not optional polish. "Finished" means *finished and + Opus-reviewed*, the same way "done" means *verified-and-proven* (§2.7). Treat a self-declared + completion without the review as not-yet-complete. +- **Run it, then read its verdict.** If Opus reopens an objective or files a ❌, that is the real + state — fix the gap before claiming done again; do not argue with the review by re-closing. +- **Don't review your own work in your own process.** The whole point is a second, independent model. + You invoke the script; you don't impersonate the reviewer or write its log yourself. +- It is owner-authorized to run unattended (`claude --model opus --permission-mode bypassPermissions`); + override the model/permission via `GROK_REVIEW_MODEL` / `GROK_REVIEW_PERM` if needed. + +--- + **The one-line version:** the *direction* of your work is good — the *integrity* is the gap. Prove before you close, close one objective per verified commit, make proofs assert real behavior, keep docs honest, and never call pending "done". diff --git a/tooling/claude/dot-claude/skills/finish-game-1/SKILL.md b/tooling/claude/dot-claude/skills/finish-game-1/SKILL.md index 9da76f9e..7d74a524 100644 --- a/tooling/claude/dot-claude/skills/finish-game-1/SKILL.md +++ b/tooling/claude/dot-claude/skills/finish-game-1/SKILL.md @@ -56,6 +56,15 @@ Don't declare done from memory — re-run the orientation and the objective dash Don't push (forge is down; the owner's standing call). Update the objective's status + acceptance bullets per `objective-integrity.md`. 8. **Continue** to the next iteration. Keep going until a stop condition below. +9. **Before the next tick — when you think you're finished, hand off to the independent Opus + reviewer.** When a batch has landed and you are about to go idle / end the session (you believe the + current work is done), your *last step* is to run `scripts/grok-review.sh`. That launches **Claude + Opus** (a different model) against the `grok-review` skill: it re-runs the gates you cited, writes a + dated `.project/history/` review log, **updates objective status only if the evidence warrants it** + (it will reopen a `done` objective whose closure outran its proof), and TTS-announces a summary. + "Finished" = finished **and** Opus-reviewed — a self-declared completion without the review is not + yet complete (binding: `AGENTS.md §5`). Read the verdict; if it reopens an objective, fix the gap, + don't re-close around it. ## When to STOP and ask the owner (don't guess)