docs(agents): require Opus self-review handoff before Grok's next tick
Wire scripts/grok-review.sh into Grok's contract as the mandatory last step at the 'I'm done' boundary: when Grok thinks a batch/objective/session is finished, it hands off to an independent model (Claude Opus) that re-runs the cited gates and updates objective status before the next tick. Self-grading is the §2 failure mode; a second model closes it. - AGENTS.md §5: 'Before the next tick — hand off to the independent Opus reviewer' (finished == finished AND Opus-reviewed; read the verdict, don't re-close around it). - finish-game-1 SKILL.md: loop step 9 mirrors the handoff at session end. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
b6e365c95d
commit
78574007e0
2 changed files with 39 additions and 0 deletions
30
AGENTS.md
30
AGENTS.md
|
|
@ -127,6 +127,36 @@ Surface options + a recommendation; don't silently pick. Otherwise: act, verify,
|
|||
|
||||
---
|
||||
|
||||
## 5. Before the next tick: hand off to the independent Opus reviewer
|
||||
|
||||
You do not get to be the only judge of your own work. The integrity gap in §2 is exactly the failure
|
||||
mode of self-grading. So **whenever you think you are finished** — a batch landed, an objective
|
||||
closed, or you are about to go idle / end a work session before the next tick — your *last step* is to
|
||||
hand your work to a **different model** for independent review:
|
||||
|
||||
```
|
||||
scripts/grok-review.sh
|
||||
```
|
||||
|
||||
This runs **Claude Opus** (not you) against the `grok-review` skill
|
||||
(`.claude/skills/grok-review/SKILL.md`). Opus re-runs the verification gates you cited
|
||||
(verify-don't-trust, §2.1), records a dated review log under `.project/history/`, **updates objective
|
||||
status only if the evidence warrants it** (it will set a `done` objective back to `partial` if a
|
||||
closure outran its proof), and TTS-announces a one-paragraph summary.
|
||||
|
||||
Rules for the handoff:
|
||||
- **It is mandatory at the "I'm done" boundary**, not optional polish. "Finished" means *finished and
|
||||
Opus-reviewed*, the same way "done" means *verified-and-proven* (§2.7). Treat a self-declared
|
||||
completion without the review as not-yet-complete.
|
||||
- **Run it, then read its verdict.** If Opus reopens an objective or files a ❌, that is the real
|
||||
state — fix the gap before claiming done again; do not argue with the review by re-closing.
|
||||
- **Don't review your own work in your own process.** The whole point is a second, independent model.
|
||||
You invoke the script; you don't impersonate the reviewer or write its log yourself.
|
||||
- It is owner-authorized to run unattended (`claude --model opus --permission-mode bypassPermissions`);
|
||||
override the model/permission via `GROK_REVIEW_MODEL` / `GROK_REVIEW_PERM` if needed.
|
||||
|
||||
---
|
||||
|
||||
**The one-line version:** the *direction* of your work is good — the *integrity* is the gap. Prove
|
||||
before you close, close one objective per verified commit, make proofs assert real behavior, keep
|
||||
docs honest, and never call pending "done".
|
||||
|
|
|
|||
|
|
@ -56,6 +56,15 @@ Don't declare done from memory — re-run the orientation and the objective dash
|
|||
Don't push (forge is down; the owner's standing call). Update the objective's status +
|
||||
acceptance bullets per `objective-integrity.md`.
|
||||
8. **Continue** to the next iteration. Keep going until a stop condition below.
|
||||
9. **Before the next tick — when you think you're finished, hand off to the independent Opus
|
||||
reviewer.** When a batch has landed and you are about to go idle / end the session (you believe the
|
||||
current work is done), your *last step* is to run `scripts/grok-review.sh`. That launches **Claude
|
||||
Opus** (a different model) against the `grok-review` skill: it re-runs the gates you cited, writes a
|
||||
dated `.project/history/` review log, **updates objective status only if the evidence warrants it**
|
||||
(it will reopen a `done` objective whose closure outran its proof), and TTS-announces a summary.
|
||||
"Finished" = finished **and** Opus-reviewed — a self-declared completion without the review is not
|
||||
yet complete (binding: `AGENTS.md §5`). Read the verdict; if it reopens an objective, fix the gap,
|
||||
don't re-close around it.
|
||||
|
||||
## When to STOP and ask the owner (don't guess)
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue