The prompt is code — and yours is drifting too
I. The work, not the rhetoric
I ended the last essay with a promise: that the next one would dig into what AI-first development actually looks like inside a real stack — the work, not the rhetoric. This is that essay. It's about one concrete capability I'm building into MetaObjects 7.0.0, and it starts from a problem I made for myself.
The first two essays were about generated code. Essay one argued that AI-generated code is locally coherent and globally divergent — drift — and that a metadata layer is the spine that keeps it coherent. Essay two placed that layer against the rest of the 2025 AI stack — context engineering, schema-driven agents, knowledge graphs, MCP — and argued none of them is the architectural substrate the problem needs.
This essay turns the same lens ninety degrees, onto something I'd been ignoring: not the code the AI writes, but the prompt the AI reads. The prompt that drives your model is code too. And if you're building anything real on top of an LLM, your prompts are drifting in exactly the ways your generated code was drifting before you had a spine for it.
II. The drift I built into my own game
I found this the way I find most things — by making the mistake myself.
I'm building a game, Party Lore, where the characters are driven by an LLM. Early on, the prompt that brought a character to life was a tidy little function. A year of features later, it was a few thousand lines of StringBuilder — concatenated instructions, conditionals, loops, string interpolation — spread across a handful of builder classes that each read from the database as they built the prompt. That last detail is the one that should have scared me sooner: I could not produce a prompt without a live database connection. Which meant I could not unit-test a prompt at all. The single most important artifact in an AI application — the thing that determines whether the model behaves — had no tests, because it couldn't be constructed in isolation.
It got worse the way these things always get worse. The same block of rules — how a character behaves when threatened — was restated at a dozen call sites, each copy subtly drifted from the others after a year of edits. A large fraction of every prompt was static boilerplate, re-sent on every single turn. When I wanted to A/B-test a phrasing, I had to pull generated prompts back out of the database into a one-off script, because the prompt only existed after the builder had run against live data.
And then the failure that actually maps to everything I'd written about drift: there was no declared shape for what a prompt needed. I'd add a field to a character, wire it into the payload, and later remove the feature that used it — but the field rode along in the prompt forever, invisible bloat I paid for in tokens on every call and couldn't see. Worse, I'd rename a field and a prompt would quietly degrade — no compile error, no exception, just slightly worse output that I'd notice three weeks later if I was lucky. I'd assumed this was just my own mess. It isn't — it's common enough now to have names: prompt sprawl and the string-in-code anti-pattern, with a whole category of tools now rushing to triage the symptoms. Locally coherent. Globally divergent. Silent. This is drift. Same disease, different organ.
III. A prompt is just (data + text + render)
Once I stopped treating the prompt as a special, mystical artifact and looked at its anatomy, it dissolved into three ordinary parts:
- Data — the state the prompt needs: the character, the world, the player, the recent history. A specific shape of inputs.
- Text — the template that arranges that data into instructions: the rules, the format, the voice.
- Render — the step that fuses the two into the final string the model sees.
I sat with that decomposition for a while, because I'd spent twenty-five years building a system that governs exactly those three kinds of things. Typed data shapes, declared once and generated into every language. Text artifacts with references, composition, and overlays. Deterministic rendering — the same inputs producing the same output every time, identical even across language ports because a conformance suite says it must. The prompt wasn't a snowflake. It was an entity problem wearing a costume — and I already had the tool for entity problems.
IV. The fourth pillar
So that's what MetaObjects 7.0.0 adds: prompt construction, a fourth pillar alongside codegen, runtime metadata, and drift detection. It's the same three disciplines I've written about, pointed at the artifacts that drive the AI itself. It maps cleanly onto the anatomy above.
The payload is a projection. (data) The "everything this prompt needs and nothing else" shape is declared as a metadata projection — and here's the part that convinced me the architecture was real, not just convenient. A projection is the read-only view abstraction I'd built for database views: a declared set of fields, some passed through from a base entity, some aggregated, materialized as a typed value object. It turned out a prompt's payload is exactly that shape. The abstraction I built for one purpose fit a completely different one without modification. When an abstraction pays for itself twice, in a domain you didn't design it for, that's the signal it's a real architectural seam and not an accident. The payload gets a generated, typed class; the fields it actually uses are now declared in one place; the bloat is visible.
The text is external and provider-resolved. (text) Prompt text never lives inline in the metadata. It's addressed by a logical reference — a group, a source, a section — and a runtime-configured provider resolves that reference to actual text: a file on disk in development, a database row or a graph node in production, in the active locale. Same reference everywhere; only the provider changes. The rule block that I'd triplicated across a dozen call sites becomes one fragment, defined once and included wherever it's needed.
Render is deterministic. (render) A logic-less Mustache engine turns (payload + resolved text) into the final prompt string, and the same inputs always produce the same string. That sounds modest; it's the part that pays the bills. Deterministic render means a prompt is snapshot-testable — you commit the expected output and assert against it in CI, so a tweak to a shared fragment shows up as a one-line diff in the actual prompt instead of as worse output discovered three weeks later. It means the render is byte-stable, which matters more than it reads: prompt caching is exact-prefix and token-level, so a stray whitespace, a reordered attribute, or a different newline silently destroys your cache hit — and the savings on the line are real, up to ~90% of input cost on the mostly-static prompt you re-send every turn. And because the engine is logic-less and conformance-gated, that determinism holds across language ports too — no per-language helper drift, nothing executing from text that arrived at runtime. The cross-language guarantee is real; it's a consequence of the determinism, not the headline.
Drift is caught at build time. (the fourth discipline) Because the payload shape, the template, and the text all sit together under one model, a verify step can prove — before anything ships — that every variable in a template resolves to a real field on its payload, that every required slot is filled, that the rendered prompt keeps its format tags and stays inside its token budget. The renamed field that used to degrade a prompt silently now breaks the build. The payload bloat shows up as a diff in a pull request instead of a line item on an invoice.
V. Why it had to be metadata, not a library
I could have written a prompt library — a tidy class with a render() method. But a library treats a prompt as code to call, and the prompt-management tools that have sprung up around this pain treat it as a string to store. Neither gives you the thing I actually wanted: a prompt as a declared artifact with a contract — a typed input shape, a verifiable template, a deterministic render — that you can check before it ships. A string registry can roll your prompt back to last week; it can't tell you the prompt references a field its payload no longer has.
The cross-language guarantee — the thing I'd have led with a draft ago — turns out to be the support beam, not the roof. Its everyday payoff isn't rendering one prompt in four languages at once; that's genuinely rare. It's that a conformance-gated render lets an eval harness (almost always Python) produce exactly the string a production service (often not Python) ships. Evaluation is supposed to run on what you actually send, not a curated approximation — and if your eval renders the prompt even slightly differently than prod, you're scoring a different artifact than you run. It's the same property underneath: a render you can trust to be identical — across runs, across a CI snapshot, or across a language boundary.
The logical-reference indirection buys the other half. Because the text is addressed by reference and resolved by a provider at runtime, the same metadata supports a fixed file today, an A/B experiment next month, and a graph-assembled or evolutionarily-optimized prompt next year — without touching the metadata or the render engine. The metadata just points; the provider gets smarter. (The optimization end of that spectrum is non-deterministic by nature, so it lives outside the byte-identical guarantee — conformance always pins a fixed provider. The guarantee and the experimentation don't fight.)
And, as with everything I build on this substrate: the generated code carries zero runtime dependency on MetaObjects. The render engine and the providers are ordinary libraries you'd depend on like any other. Throw the platform away tomorrow and your prompts still render. Metadata is the architecture, never the lock-in.
VI. What this means for your stack
If you're shipping LLM features at any real scale, you probably already have this problem — you may just not have named it yet, though the industry is naming the pieces fast. The symptoms are specific:
- Prompts you can't test without standing up a database, because the assembly reads from repositories as it builds.
- The same instruction block copy-pasted across services and agents, each copy a little different, none of them canonical.
- Prompts that have quietly ballooned in token cost, and nobody can tell you which fields in the payload are actually used.
- A schema change in one service that degrades a prompt in another, with no error to catch it — just worse output, discovered late.
None of that is fixed by a bigger model, and none of it is fixed by a prompt-management SaaS that just stores and versions strings. It's the same architectural question I keep arriving at: where does the contract live? For code, the answer was a metadata layer above the call sites that everything generates against and that drift-checks itself. For prompts, the answer is the same layer — the payload's shape declared once, the text addressed by reference, the render conformance-gated, the whole thing verified at build time. The prompt is code. It deserves the same architecture the rest of your code finally got.
VII. What's next
Prompt construction lands in MetaObjects 7.0.0. The substrate comes first — the projection and persistence work that the payload depends on — and prompt construction sits on top of it. The open-source platform and the spec are public at metaobjects.dev, and the roadmap tracks both.
This is the third essay in a series. The first was the problem and the origin story; the second placed the metadata layer against the rest of the AI stack; this one turned the lens onto the prompts themselves. The next will go further into what building AI-first on a metadata substrate looks like in practice — more of the work, less of the rhetoric.
My writing lives at dougmealing.com/writing, and I'm always up for a conversation about any of this — drop me a note.