What is StrongDM's approach to AI-generated code?

StrongDM's AI team operates under two rules: code must not be written by humans, and code must not be reviewed by humans. They spend at least $1,000 per day per engineer on AI tokens and have built a 'Digital Twin Universe' of behavioral clones for testing AI-generated code at scale.

What is spec-as-product in AI coding?

Spec-as-product is the idea that in an agentic world, releasing specifications rather than source code makes more sense. StrongDM's Attractor coding agent was published with specifications only, letting users' AI agents implement it themselves rather than distributing human-readable code.

How does StrongDM verify AI-generated code without human review?

StrongDM uses probabilistic 'satisfaction' instead of boolean pass/fail tests. They measure what fraction of observed trajectories through a scenario likely satisfy the user, treating scenarios like holdout sets in machine learning. The AI never sees these scenarios during development, creating external validation.

StrongDM Ships Code Nobody Reads

StrongDM's AI team operates under two rules that would make most engineering managers nervous: code must not be written by humans, and code must not be reviewed by humans.

This isn't a thought experiment. The three-person team, founded in July 2025 by Justin McCarthy, Jay Taylor, and Navan Chauhan, has shipped real software this way. Their most interesting release isn't even the code itself; it's a coding agent called Attractor, published with specifications only. No source code at all.

The logic sounds backwards until you sit with it for a minute. If AI writes all your code anyway, what's the point of releasing the code? Release the spec. Let users' agents implement it themselves.

In an agentic world, the specification is the product.

Manufacturing, Not Pair Programming

StrongDM treats AI coding less like collaboration and more like a factory floor. They've built what they call a "Digital Twin Universe": behavioral clones of third-party services like Okta, Jira, Slack, and Google Docs. These aren't mocks. They're self-contained Go binaries that replicate APIs, edge cases, and observable behaviors at scale.

The clones let them run thousands of scenario tests per hour without hitting rate limits or risking production systems. Think of it as testing infrastructure for AI-generated code, built at the scale AI-generated code demands.

Their threshold for taking AI coding seriously?

At least $1,000 in tokens per day per human engineer. Below that, they argue, you're not learning what's possible.

The most conceptually interesting move is how they handle verification. Traditional unit tests are boolean: pass or fail. StrongDM instead measures "satisfaction": of all observed trajectories through a scenario, what fraction likely satisfy the user? They've repurposed the term "scenario" to mean end-to-end user stories stored outside the codebase, like holdout sets in machine learning. The coding agents never see these during development. It's an attempt to create genuine external validation when the AI is writing both implementation and tests.

Clever. But also a bit slippery.

Probabilistic satisfaction isn't proof. When AI writes the code and the tests, and the scenarios are just statistical samples, you're trusting that your holdout set represents reality. You're trusting that edge cases you haven't imagined will surface through random trajectories. That's a lot of trust.

The team has shipped substantial software this way. CXDB, their AI Context Store, spans 16,000 lines of Rust, 9,500 lines of Go, and 6,700 lines of TypeScript. It stores conversation histories in an immutable DAG structure. Not trivial. But correctness and "works most of the time" are different standards, and security software (which is StrongDM's core business) tends to fail in edge cases, not averages.

The question isn't whether this approach can produce working code. Clearly it can. The question is whether the verification model catches the failures that matter.

Where This Fits

StrongDM has built something genuinely novel: a production system for generating production systems. The Digital Twin Universe solves the testing bottleneck. The scenario approach addresses the verification problem, even if it doesn't fully solve it. The spec-as-product framing feels prescient.

They're not alone in betting big on AI-generated code. Just yesterday, Pydantic shipped Monty, betting that code generation beats tool calling for agents. Apple's move to embed Claude as Xcode's default coding agent signals where platform vendors think this is heading. StrongDM is just more willing to remove humans from the loop entirely.

What they haven't proven is that their approach scales to systems where correctness is mandatory rather than probabilistic. A 98% satisfaction rate means 2% of scenarios fail in ways that might matter. For a context store, that's probably fine. For access control? Authentication? The systems StrongDM actually sells?

Our read: That's either bold confidence or a lesson waiting to happen. Either way, worth watching.

Manufacturing, Not Pair Programming

Where This Fits

Frequently Asked Questions

What is StrongDM's approach to AI-generated code?

What is spec-as-product in AI coding?

How does StrongDM verify AI-generated code without human review?