Token Optimization¶

SDD skills include several strategies to minimize token usage and cost across the workflow. These strategies work together — model selection reduces cost per token, while context management reduces the number of tokens processed.

Model selection (`model_hint`)¶

Each skill declares the minimum model tier needed in its model_hint frontmatter field. Orchestrators (sdd-agent, sdd-ff, sdd-continue) pass this hint when spawning subagents.

Hint	Use for	Skills
`opus`	Judgment-heavy phases: design decisions, solution analysis	sdd-propose, sdd-design
`sonnet`	Code comprehension: analysis, spec writing, implementation	sdd-explore, sdd-spec, sdd-apply (subagents), sdd-verify, sdd-audit, sdd-steer, sdd-init, sdd-new, sdd-ff, sdd-discover, sdd-agent
`haiku`	Mechanical phases: template-filling, search, dispatch	sdd-tasks, sdd-archive, sdd-recall, sdd-docs, sdd-continue, sdd-apply (orchestrator)

Using opus only for propose and design (the judgment phases) while running mechanical phases on haiku can reduce cost by 60-70%.

Context management¶

The artifact chain¶

Each SDD phase produces a file that captures all decisions. After a phase completes, the conversation context is redundant with the artifact:

explore  → notes.md       (findings)
propose  → proposal.md    (scope decisions)
spec     → spec.md        (behavior)
design   → design.md      (architecture)
tasks    → tasks.md        (execution plan)
apply    → commits         (code)
verify   → PR              (result)

When to clear context¶

Moment	Clear?	Reason
Between explore and propose	No	Coupled — exploration feeds proposal questions
After propose	Yes	`proposal.md` captures everything
After spec	Yes	`spec.md` captures everything
After design	Yes	`design.md` captures everything
After tasks	Yes (most important)	Apply is the longest phase — entering clean saves the most
During apply	No	Subagents already isolate context
After verify	Yes	PR created, everything captured

Why this matters¶

If context is 50K tokens after propose + spec and 15 turns of apply remain, that is 50K × 15 = 750K tokens of input carrying stale context. Clearing after tasks and re-reading the artifacts (~5K tokens) saves massively.

/sdd-continue makes this natural — it detects the current phase from the artifacts, not from conversation history.

Selective steering loading¶

Skills that read openspec/steering/ load only relevant specialist files, not all .md files. The selection is based on the task's files:

Specialists with applies_to: all in their manifest → always loaded
conventions-testing.md → only when the task touches test files
conventions-security.md → only when the task touches auth, API, or input handling
Other specialists → only when the file matches the specialist's domain

With 5+ specialists installed, this reduces steering context from ~8KB to ~3KB per subagent.

Prompt caching¶

Orchestrator skills (sdd-apply, sdd-agent) read steering files once and pass the content inline to subagent prompts. This creates a fixed prefix that benefits from LLM prompt caching (5-minute TTL) across sequential agents.

The same strategy applies to sdd-discover, which uses an identical prompt prefix across all parallel domain subagents.

Output style¶

All skills include a terse output directive. Status reports use tables and single-line bullets instead of prose. This reduces response tokens (which also cost money) without losing information.

English artifacts¶

All generated artifacts (proposal.md, spec.md, design.md, tasks.md, notes.md) are written in English. English uses approximately 30% fewer tokens than Romance languages (Spanish, French, Portuguese) for the same semantic content.