Software 3.0 Developer Cheatsheet

The cycle

① Audit → ② Orient → ③ Build → ④ Stay Sharp → ↻ repeat at every model release

Before you build

① Audit

go/no-go test — part A (kill)

killCould this be replaced by one multimodal prompt + 1–2 tool calls or an MCP?
Is this Software 1.0 plumbing — an orchestration layer, API wrapper, or UI shell — around something the model now does natively?

// if any answer is YES →

Stop or pivot. You are building plumbing the next release will absorb.

go/no-go test — part B (build)

buildDoes this only exist because of Software 3.0 — genuinely impossible before LLMs?
Does it embed genuine domain expertise or proprietary context the model can't replicate alone?
Does it solve something the model alone cannot?
Will it still pass both tests after the next model release?

// if all answers are YES →

Proceed. This is worth building.

② Orient

Finding your verifiable niche

Listed the domains where I have genuine deep expertise
Confirmed outputs are objectively verifiable: correct/incorrect, better/worse, faster/slower
Checked: are frontier labs specifically applying RL to this niche? (If no → window is open)
Mapped the RL environment: defined reward, penalty, and what "better" means concretely
Validated niche with a well-prompted agent before committing to fine-tuning
Written a clear thesis: what I'm building, why it passes the S3.0 test, what makes it defensible

Knowledge brain set up — runs across all phases

Created a dedicated folder for strategy/knowledge documents
Written an initial context prompt covering company, product, domain, and goals
Generated foundational docs (market positioning, technical direction, opportunity map) as markdown
Habit established: new ideas go through the brain before committing

During the build — Phase ③

③ Build

Before handing off to an agent

Spec and plan written — "done" is defined before the agent starts
Task is scoped tightly — one agent, one contained objective
Pass/fail criteria defined — tests, expected outputs, observable behaviour
Asked: what is the text I copy-paste to the agent? (Not a script for a human to run)

③ Build

While agents are running

Context window is clean — stale or bloated context actively removed
Running ≤ 3–4 parallel agents (human review bandwidth, not a technical limit)
Checking in regularly — not letting any agent run open-endedly
Knowledge brain provided as context where domain relevance matters

③ Build

After — reviewing agent output

Reviewing output as a draft, not a deployment
Checked for: bloat, excessive copy-paste, brittle abstractions, silent logic errors
Unit tests and smoke tests pass
CI blockers in place — bad code cannot reach production automatically

Agent-first infrastructure

③ Build

External-facing product

llms.txtAdded to root — tells agents what the product does, how the API works, how to trust it
Docs are agent-readable: clear structure, no assumed context, direct instruction format
Core functionality exposed via MCP, clean API, or agent-compatible tool calls
Asked: can an agent understand and use this within a single context window, without human translation?

③ Build

Internal tooling

Human UI stripped where an agent, not a person, is the primary consumer
Data structures are LLM-legible: descriptive field names, explicit schemas, no visual-layout dependency
Internal docs written as agent briefs, not employee onboarding
Agent-native benchmark passed: agent can complete primary workflow end-to-end, no human mediation

Staying sharp — Phase ④

④ Stay Sharp

Understanding practice

Knowledge brain is being actively fed — articles read → agent updates relevant sections
Regularly asking the brain: "What do I know about X? What are the gaps? What should I read next?"
Capable of evaluating agent output critically — know what "good" looks like in this domain
Building for where capabilities will be in 3–6 months, not only today's baseline
Phase 1 audit scheduled for next major model release

Key tests at a glance

Kill test	Can one prompt + tool calls replace this?
Build test	Does this only exist because of LLMs?
Agent-native test	Can an agent deploy and use this without human translation?
Niche test	Domain expertise + verifiable outputs + no lab RL coverage?
Understanding test	Am I outsourcing execution or understanding?
Setup test	What's the text I copy-paste to my agent?
Direction test	Can I evaluate what the agent produced, and do I know what good looks like?

Core principles — reference

P1 · New paradigm

Don't ask: "how do I use AI to do this faster?" Ask: "what computing paradigm does this problem belong to?"

P2 · Verifiability is the moat

Domains with objectively measurable outputs support RL and fine-tuning. That's where durable advantages are built.

P3 · Everything is built for humans

Agents need sensors and actuators. Design clean data access and clean action paths — not human-facing UIs.

P4 · Outsource execution, not understanding

You can't be a good director without genuine domain understanding. That's the scarcest resource now.

P5 · Vibe coding has a ceiling

Production work needs agentic engineering: spec-first, structured review, tests, CI. 3–4 agents max you can review properly.

P6 · Ghosts, not animals

LLMs are statistical circuits, not intelligences. Expect jagged capability — highly capable in some areas, hard edges in others.

Software 3.0 Field Guide