
How pairing Sonnet 4.6 for spec generation with Opus 4.6 for validation — and a structured human review gate — is accelerating our legacy platform modernisation without sacrificing quality.
The platform I am focusing to modernise right now handles the full lifecycle — quoting, underwriting, rating, premium handling, payments — across a platform that has quietly absorbed decades of complexity. Java, Spring, GWT, Apache Camel, iLog/Drools, WebSphere. Literally, the stack is deep and the domain knowledge deeper.
This Modernising, is not a greenfield problem. It is a matter of precision. Every spec we write has to be accurate about what the system actually does, traceable to real behaviour, and reviewable by engineers who own specific modules. The challenge isn’t ideas — it’s throughput and quality at the same time.
In AI assisted engineering landscape, Human-in-the-loop is something quite crucial, but when it comes to this level of complexity and precision with a fast turn around, the million dollar question is “what can be a better approach to speed up the review process?”
Maybe: a three-step workflow “generate, validate, human review”
The generate-then-validate pattern
The core idea is simple: use a fast, capable model to produce draft specs at speed, use a more deliberate model to stress-test them, then pass a targeted brief to a human reviewer before any engineering effort is committed.

Basically, you feed context into Sonnet 4.6 — the relevant module, its interfaces, its known behaviours, the modernisation goal — and ask it to produce a structured spec: scope, dependencies, edge cases, migration considerations. Sonnet is efficient at this. It produces well-structured output quickly, and the drafts are usable from the first pass.
Then Opus 4.6 reviews the draft against the same context. It looks for gaps in coverage, incorrect assumptions about how legacy state machine or underwriting pipeline actually works, missing edge cases around things like policy locking or premium handling rules, and logical inconsistencies in the migration approach. It surfaces the questions the spec should answer but doesn’t.
You might ask why two models? Generation and validation are different cognitive tasks. Optimising for both in a single pass tends to compromise both. Separating them — with models suited to each — produces better output and faster cycle times than asking one model to do everything. Also, considering token cost is different and obviously Opus is 3x pricier than Sonnet.
Human-in-the-loop: the next step
AI-assisted validation improves throughput significantly, but the most consequential decisions on a platform like ours require human judgement. Rather than treating engineer/SME review as a loose sign-off at the end, we’re formalising it as a distinct gate with defined inputs — the next evolution of this workflow.
The Opus validation output becomes a structured brief for the human reviewer, surfacing exactly where the model flagged uncertainty, what assumptions it could not verify, and which integration touch points it recommends a human assess directly. The reviewer’s job becomes targeted, not open-ended.
- Opus flags open questions explicitly Issues the model could not resolve from available context are marked clearly — not buried in prose — so the engineer knows exactly where to focus attention.
- Engineer reviews targeted risk areas Rather than reading the full spec cold, the reviewer works from Opus’s flagged items: integration contracts, state machine edge cases, and data model assumptions that require domain sign-off.
- Decisions are recorded, not just made Human judgement calls — especially where the model flagged genuine ambiguity — are captured alongside the spec, creating an audit trail that informs future modules.
- Approved spec enters delivery Only specs that have cleared all three steps — generate, validate, human review — proceed. No shortcuts for complexity or time pressure.
This matters on our platform and scope because certain knowledge cannot be recovered from code or documentation alone. The history of why a particular gateway works the way it does, or why a premium handling rule has an exception, often lives with people. Formalising the human step ensures tacit knowledge gets encoded into specs before modernisation decisions are locked in.
And the lesson
The pattern works best when the generation prompt is specific about what the spec is for — not just “document this module” but “produce a migration spec for the product integration, accounting for its orchestration and the Downstream Export Gateway dependencies.” The more bounded the context, the more useful the output from all three steps.
The validation step is not just error-catching. Opus consistently identifies where a spec is technically correct but strategically incomplete — for example, a spec that describes what to migrate but doesn’t address sequencing with dependent modules. That kind of feedback is hard to get quickly from any other source, and it gives the human reviewer a much sharper brief.

Leave a Reply