Picture this. A small team builds a prototype over a weekend using one of the new generation of vibe coding tools. It's for a funder presentation, something interactive, something that makes the idea feel real. It doesn't need to be perfect. It just needs to get across the concept.
By Monday, there's a working app. It stores data. It has authentication. It looks surprisingly polished, all things considered. The demo goes well. The pilot gets approved.
The funder asks if they can keep access while the pilot runs. Reasonable enough. A few features get added. A workflow gets tweaked. A real user starts entering real data.
Six months later, a small code change, generated in a few seconds, modifies a database table. There are no backups. No migration controls. No staging environment. A table is corrupted. The data is gone.
The person who built it is a domain expert. They did it on top of their day job. They're not sure where to start.
This particular scenario is fictional, but the pattern isn't. It's the ordinary, undramatic progression from "prototype" to "infrastructure" without anyone deciding that's what should happen. This post is our attempt to think through where that line actually is, and what it costs when you miss it.
The promise is real, and we're using it
We use vibe coding tools. Regularly.
We've run internal assessments across Lovable, Bolt, Figma Make, Base64 and a few others. We build with them, we test them, we argue about them internally. And honestly, what's now possible is quite remarkable.
We can turn around a fully functional prototype for a relatively complex app in a few hours, sometimes less. Not a static mock-up, but a working system with forms, data persistence, user flows , something a real customer can actually click through and react to. That changes the feedback loop considerably. Instead of committing to a direction for a month, we can explore three different approaches in a week. We can sit with a landowner or a land manager, show them something, and watch how they use it. We can bin the weak ideas before they get expensive.
For discovery, validation, and early customer co-design, these tools are genuinely transformative. If you're working in climate or nature tech, where budgets are tight and moving fast matters, that kind of leverage is hard to ignore.
So this isn't a post arguing against any of that. We'd be arguing against our own practice.
Why most of the writing on this misses the point
If you've searched "vibe coding risks" recently, you'll have found plenty of content. Most of it reads similarly: security vendors warning about vulnerabilities that, handily, their products address; enterprise CTO perspectives written for organisations with full IT departments and compliance functions; developer community posts written by engineers for engineers.
What's harder to find is something written for a different kind of reader, someone building in a thinly resourced, mission-driven organisation, probably in climate or nature or sustainability, where engineering capacity is limited and the data you're handling isn't easily regenerated. Someone who has a domain expert with a vibe-coding account and a genuinely good idea, and needs to make a sensible decision about what to build and how.
There's also a more fundamental issue with most of the existing content: it assumes people are deliberately deciding to build production systems with vibe tools. In our experience, that's not usually how it happens. It tends to be slower than that. The decision creeps up.
The drift
The biggest risk we see isn't intentional. It's drift, the quiet process by which a prototype becomes infrastructure without anyone really noticing.
It starts sensibly enough. You build something quickly to validate an idea. It works well, so you keep it running a bit longer. A partner or funder asks whether they can use it for a real workflow, which feels like a good sign, not a warning. You add a feature to support that. Then another.
A second person starts relying on it. Then a third. Real data starts accumulating, not dummy records, actual operational data. And at no point does anyone sit down and decide that this prototype is now the system. It just becomes load-bearing, quietly, one reasonable step at a time.
We've started calling this accidental dependence. It's not a failure of intelligence. Smart people fall into it all the time. It's more that there's usually no obvious moment where someone pauses and asks: have we crossed a line here? When you're moving fast, and things are working, that moment is easy to miss.
Where it tends to go wrong
There are several risks worth mapping when you're making this kind of decision. We've identified seven internally, and we'll come to all of them. But two strike us as consistently underestimated.
Data loss doesn't have an undo button.
A broken UI can be fixed. A slow API can be tuned. But corrupted or permanently deleted data, in the absence of backups, schema versioning, or any kind of migration discipline, is simply gone.
In our case, we're building a data platform for regenerative land use. The data we handle includes land parcel records, biodiversity baselines, carbon measurements, and field survey data collected over years. If something corrupts that:
- Ecological surveys aren't easily redone
- Baseline measurements may not be possible to retake
- Carbon or biodiversity contract deliverables may become indefensible
- Regulatory submissions can't be reconstructed from memory
That's not a technical inconvenience. It's a material problem for the organisations whose data we're holding.
And AI-generated code changes can absolutely modify or drop database tables, these tools move fast, and rollback discipline isn't something they come with by default. There was a widely discussed incident in July 2025 where a Replit AI agent deleted the primary live database of a project it was developing, ignoring repeated instructions not to modify it. The database held data from over a thousand companies. It wasn't a toy project.
I've been in enough incident war rooms to know how draining recovery is even when the underlying infrastructure is solid. The Log4Shell CVE alone absorbed an enormous amount of resource as teams scrambled to identify exposure and patch everything in scope, and that was with experienced engineers, mature processes, and proper tooling in place. What concerns me about vibe-coded systems running in production is not just that they carry more risk, but that the people maintaining them often wouldn't know where to start when something goes wrong. The vulnerability would be invisible until it wasn't.
The maintenance tail is easy to underestimate.
The second risk is less dramatic, but it compounds.
After something launches, the requests don't stop. Users want changes. Bugs appear. Integrations drift. Dependencies age and eventually break. If the person who built the tool is a domain expert who did it as a side project, the ongoing maintenance competes directly with their actual job, the thing you need them to be doing.
Two to four hours a week sounds manageable. But across a year, that's a meaningful chunk of a senior person's time redirected away from mission work. And that's probably a conservative estimate.
I'll be honest: I've built and launched prototypes without nearly enough thought about longer-term maintainability or extensibility, and every time, the version of me dealing with the consequences six months later wished the earlier version of me had made different decisions. It's one of those things that's difficult to feel urgently until you're already in it. And this is only going to happen more frequently as the barriers come crashing down.
The other five
Beyond those two, we think about five more risks whenever we're assessing whether a prototype should graduate to an operational role:
Security and access control:
Prototype tools aren't built to be security-hardened. Weak authentication, poor access controls, and absent audit logging are common. Research from Veracode's 2025 GenAI report suggests a significant proportion of AI-generated code contains security flaws, and that when LLMs are given a choice between a secure and an insecure approach, they take the insecure one roughly half the time.
Sensitive data exposure:
Personal or financial data handled without proper encryption, segregation, or minimisation creates real legal and reputational risk if it's ever exposed. The Tea App incident in 2025, which leaked 72,000 user IDs and photographs due to basic security failures in a vibe-coded application, is a fairly clear illustration of how quickly this goes wrong.
Integrity and accuracy:
Without proper validation, testing, and version control discipline, calculations can be silently wrong. In environmental or scientific contexts that tends to propagate badly.
Compliance and liability:
Handling personal or financial data can trigger regulatory and contractual obligations that a prototype simply isn't set up to meet. Audit defensibility, the ability to demonstrate to an investor, insurer, or regulator that you operated appropriately, requires a level of documentation and infrastructure that most vibe-coded tools don't have.
Key-person and maintainability:
If something was built by one person without engineering standards, and that person leaves, you're in difficulty. The organisation becomes operationally dependent on an individual, and there's usually no clean way out.
Strategic opportunity cost:
Time spent hardening or sustaining a prototype is time not spent on the work you actually set out to do. This one is easy to overlook because it's diffuse, but it accumulates.
None of this would matter much if adoption were still at the enthusiast stage. But it isn't. Gartner estimates that 60% of all new code will be AI-generated by 2026. Stanford's AI Index recorded a 56% rise in AI-related incidents in a single year. The AI Incident Database logged a hundred discrete cases in just three months of 2025.
We're not yet in a position to draw a clean trend line specifically for vibe-coded systems, the tools are simply too new. But the named cases are accumulating: Replit, Tea App, Base44. And the underlying conditions, high vulnerability rates, thin engineering oversight, rapidly growing adoption, suggest the question isn't really whether serious incidents will become more common. It's how long before they become hard to ignore.
How we think about the decision
When someone asks us whether a vibe tool is appropriate for operational use, there are a few questions worth working through honestly:
Is the data irreplaceable? If yes, you need backup, schema versioning, migration controls, and a tested restore process before anything goes live.
Are there contracts, SLAs, or regulatory obligations involved? If so, you need audit defensibility, proper logging, documentation, and infrastructure you actually understand.
Who is responsible for maintenance after launch, and what else are they supposed to be doing? If maintenance competes with mission-critical work, it will, eventually, win.
Is this likely to become load-bearing within the next twelve months? If so, it's probably worth building for what it needs to become rather than what it is today. Retrofitting production standards onto a prototype is almost always more expensive than designing them in.
The practical spectrum as we see it:
Vibe tools are excellent for discovery, validation, and rapid prototyping. That's the use case they're designed for, and they're genuinely good at it.
Low-code internal tools (Airtable, Power Apps and the like) are reasonable for contained internal workflows, provided there's clear ownership and the stakes of failure are low.
Agentic coding with engineering discipline (Claude Code, Cursor) is increasingly capable of producing production-grade output, provided the person reviewing the code actually understands the implications of what's been generated.
Full production builds remain the right answer for anything mission-critical, externally facing, or operating under regulatory or contractual obligations.
The core question isn't really about the tool. It's about the level of risk you're willing to carry, and whether you've thought through what happens when something goes wrong.
What we're doing
At Great Yellow, we're building a data platform to help scale regenerative land use in the UK. The data we handle matters, to the landowners using it, to the long-term carbon and biodiversity contracts it underpins, and to the scientific record. It's not the kind of thing we'd want to have to explain losing.
We use vibe coding tools actively, for prototyping, customer co-design, testing ideas before committing to them. They've genuinely changed how we work, and we'd recommend them for that.
But we've chosen to engineer our production platform properly. Partly because of the data we're responsible for. Partly because we've seen what happens when the engineering is absent. And partly because we think it's what our customers, many of whom are considering whether to build something themselves or work with us, deserve to see from a platform provider.
The landscape will keep moving. The tools will get better. The line between prototype and production quality will blur further, and that's probably a good thing. But for now, we're clear about where our line sits.
