Codex /goal tips: Define verifiable 'done' states

AI Jason2026-05-09go watch the original →

the gist

Codex /goal loops agents on complex tasks until an LLM judge confirms verifiable completion; craft prompts with explicit objectives, constraints, validation (e.g., Playwright checks), and quantifiable stops to prevent early quits.

/goal Mechanics in Codex and Hermes

OpenAI's /goal feature in Codex triggers an agent loop for complex projects. The agent works on the task, then an LLM judge evaluates completion using a prompt that defines 'done', outputs status and reasoning. If incomplete, it sends a continuation message: "Continuing toward your standing goal goal text. Take the next concrete steps. If you believe the goal is completed, state so explicitly and stop." Codex adds instructions like "Do not accept proxy signals as completion; mark goal achieved only when audit shows objective met and no work remains; use /update goal with status complete."

Hermes agent uses a similar "persist goal" feature. Both improve on rough loop by replacing fixed iterations with LLM-judged stops and evolving prompts with goal context and state.

Setup: Run codex features list, then codex features enable goal. Use /goal "[prompt]" (e.g., /goal "help me migrate my codebase from javascript to typescript and making sure all screens stay exactly the same visually using playwright interactive to verify the output"). Monitor with /goal status, pause with /goal pause, clear with /goal clear, or fork chats with /side.

Effective Goal Prompting

Good prompts scope bigger than one task but smaller than a backlog: specify objective, exclusions, validation, and stop condition upfront. Example structure: "Complete objective without stopping until verifiable end state. Do not change X. Validate with method. Stop when quantifiable."

Migration: "Migrate this project from old stack to new stack and make sure all screens stay exactly same visually using Playwright interactive to verify."
Prototype: Point to plan.md/PRD, create tests per milestone, verify with Playwright and reference screens.
Optimization: "/goal optimize the prompts in the prompt file until the eval reaches a target score. After each change, run the eval command, inspect failing cases, keep prompt minimal. Stop when target met."

Align first: Converse to share project context, priorities, past failures, bugs; let agent ask questions. Quantify: Avoid fuzzy like "keep going until everything fixed"; use "find 20 discrete new issues, produce repro/proposed fix per issue, push to branch, log to run folder."

New projects: List files, anti-patterns, logs, design patterns, user expectations (e.g., "Build X reference implementation to repos; follow patterns; users expect behaviors").

Tools and Long-Run Extensions

goal-buddy (npx goal-buddy) scaffolds prompts: Run npx goal-buddy, then $ codex > go prep [vague goal] (e.g., "build a rain type game using image gen for image assets and beautiful graphics verify it on desktop"). Generates goal.md (describes request, constraints, stop rules, detail loop) and state.yaml (tasks from codebase). Use /goal @goal.md; agent updates state.yaml per loop. Example: One prompt yields image-gen assets and functional game.

Limitations: Suited for hours-long coding (migrations, refactors, experiments); fails on weeks/months tasks without quick feedback (e.g., SEO).

Crewlet /mission: For long horizons, capture in mission.md (metrics to optimize). Agent hypothesizes strategies, acts, artifacts output, schedules next run (hours/weeks). Passes mission.md + prior summaries. Human-in-loop for dramatic changes. Example: Grow Twitter to 10k followers—iterates posts, analyzes, doubles down on high-perf founder-voice threads.