Executable Specifications: Why Your Documentation Is Lying to You (And How to Fix It)
Every engineering organization has the same dirty secret: their documentation is wrong.
Not slightly wrong. Fundamentally, dangerously wrong. That Confluence page describing the payment flow? It was accurate 18 months ago, before three refactors and a migration. The API docs in the README? They describe endpoints that no longer exist.
When I became CTO and started building our engineering org from scratch, I made a decision that seemed radical at the time: we would have almost no traditional documentation. Instead, our specifications would be our documentation.
The Documentation Death Spiral
Traditional documentation follows a predictable lifecycle:
- Someone writes detailed docs for a new feature
- The feature ships and evolves
- The docs are not updated (because there's no enforcement mechanism)
- New team members read outdated docs and form incorrect mental models
- They make decisions based on wrong assumptions
- Bugs happen, incidents happen, trust erodes
I've seen this cycle destroy teams. An engineer reads a wiki page, builds a feature based on what it says, and discovers in production that the system actually works completely differently. Days of work, wasted.
Specifications That Cannot Lie
Executable specifications solve this problem elegantly. A specification like this:
Scenario: Retry failed document processing after transient error
Given a document "Invoice-2024-001" failed processing
And the failure reason is "timeout"
When the retry scheduler runs
Then the document should be requeued for processing
And the retry count should be incremented
And a retry event should be logged
This specification runs as part of our CI pipeline on every commit. If the system behavior changes and the specification no longer matches reality, the build breaks. It's impossible for this documentation to become outdated because the system itself enforces accuracy.
How We Structured Our Specification Library
Across 200+ engineers and dozens of services, we needed a clear structure. Here's what we built:
Feature-Level Specifications
Every feature has a .feature file that lives alongside its code. These describe the business behavior in domain language:
features/
document-processing/
submission.feature
validation.feature
approval-workflow.feature
retry-handling.feature
user-management/
registration.feature
authentication.feature
role-assignment.feature
Cross-Service Specifications
For behaviors that span multiple services, we maintain contract specifications in a shared repository. These define the expected interactions between systems:
Feature: Document service notifies approval service
Scenario: New document triggers approval request
Given the document service receives a valid document
When the document passes validation
Then an approval request event is published
And the event contains the document ID and author
And the event schema matches version 2.1
Both the document service and the approval service run this specification. If either side changes in an incompatible way, both builds fail immediately.
Architecture Decision Records as Specifications
We even encode architectural decisions as executable specifications:
Feature: Service communication patterns
Scenario: Services communicate asynchronously
Given service A needs to notify service B
When the notification is sent
Then it should be published to the message queue
And service A should not wait for a response
# ADR-047: All inter-service communication is async via message queue
The Portal That Replaced Our Wiki
We built an internal tool that renders all our specifications into a searchable, browsable portal. It's automatically generated from the specifications that run in CI, so it's always accurate.
Product managers use it to understand current system behavior before writing new requirements. Support engineers use it to diagnose customer issues. New hires use it to onboard — reading specifications is literally the first thing they do.
The portal shows:
- All features organized by domain
- Each scenario with its current status (passing/failing)
- Last verified date (when the specification last ran)
- Change history (linked to git commits)
Practical Implementation Tips
Start with the most confusing system
Don't try to specify everything at once. Find the system that causes the most confusion — the one where people constantly ask "wait, how does this actually work?" Write specifications for that system first. The immediate clarity will sell the approach to skeptics.
Write specifications before code
This is the BDD way, and it's essential for documentation purposes. If you write specifications after the code, they become tests. If you write them before, they become design documents that happen to be executable.
Use domain language, not technical language
Bad: Given the POST /api/v2/documents endpoint receives a JSON payload with content-type application/json
Good: Given a new document is submitted for review
Specifications should be readable by anyone in the company. The technical details belong in the step definitions, not in the specifications themselves.
Make specifications a first-class citizen in code review
We require specification changes to be reviewed by both a developer and a product owner. This ensures that behavioral changes are intentional and understood by all stakeholders.
The ROI
After two years with this approach:
- We deleted 80% of our wiki pages. The remaining 20% are high-level architectural overviews and onboarding guides — things that don't change frequently.
- Onboarding time dropped from 3 months to 6 weeks. New engineers understand systems by reading specifications, not by deciphering tribal knowledge.
- Cross-team integration issues dropped by 70%. Contract specifications catch incompatibilities before they reach any environment.
- Support escalations decreased by 45%. Support engineers can answer "how does the system work?" themselves.
Your documentation is lying to you. Mine can't — because it runs every hour, and if it lies, it fails.