The Wrapper Paradox, Vibe Coding, and What Actually Scales

Sep 18, 2025

Hey, It's Kenneth.

Welcome to another edition of Open Scout.

Looking for the next feature: We’re always on the hunt for early teams doing ambitious work. If you know a Pre-Seed, Seed, or Series A startup that deserves more eyes on it, hit reply and tell us. We’ll feature the best picks in an upcoming issue.

Everywhere you look someone is posting a Stripe screenshot. One hundred thousand in MRR. One million ARR in ninety days. A thin UI on top of GPT or Claude. Viral launch. A graph trending up. Then silence.

The real question is not whether you can launch. You can. The real question is whether these products can compound into durable companies. Can they reach unicorn scale. Or are they features that peak, plateau, and get copied or acquired.

Before we begin... a big thank you to this week's sponsor.

The $2 Trillion Asset Class You're Missing

While retail investors chase volatile tech stocks, institutions like BlackRock and KKR are quietly allocating billions to private credit—a market that's grown 19x since 2006 and now approaches $2 trillion globally.

Why? Private credit offers what traditional portfolios increasingly lack: consistent yield potential (Percent's marketplace delivered 14.9% average net returns in 2024), lower correlation to public markets, and shorter investment horizons averaging just 9 months.

With Percent, accredited investors can now access the same institutional-quality deals that were once exclusive to the ultra-wealthy. Start with as little as $500 and get up to $500 on your first investment. Invest like the smart money.

Access Institutional-Grade Private Credit

If you're interested in sponsoring this newsletter, please reach out here.

What changed

AI made prototyping cheap and fast. “Vibe coding” is now mainstream. You talk, it codes. Searches for “vibe coding” exploded this summer, which tracks with what teams feel on the ground. Rapid prototypes are normal now.

"Vibe coding" growth chart — Exploding Topics

Speed helps. It also shifts the work. Production still needs logging, error handling, observability, and tests. Those are not optional. They are the tax for shipping software that will not wake you up at 2 a.m.

The second shift is cognitive. Reasoning models still break in predictable ways. Obfuscate a prompt or raise task complexity and accuracy falls hard. One recent code-generation paper shows accuracy drops up to 42.1 percent when you keep semantics but scramble structure. Sean Goedecke’s writeup on complexity and Tower of Hanoi shows the same curve. Once the task crosses a threshold, models do less thinking and fail more. That matters when your customer workflow is messy.

Why the screenshots mislead

Most “$1M ARR” posts bundle things that are not ARR. Lifetime deals. Annual prepay for one-off pilots. Services. The cash is real. The recurrence is not. When you look at cohorts, the curve often decays by month three. Token bills rise while net dollar retention falls. That is how a nice Stripe picture becomes a thin business.

Under the hood, unit economics are fragile. Your product is orchestration and UX. Your COGS is tokens, hosting, and support. Model list prices set your floor. As of today, OpenAI lists large model prices around $1.25 per million input tokens and $10 per million output tokens, with cheaper “mini” tiers near $0.60 input and $2.40 output. You do not have to love those numbers. You do have to model them.

A quick P&L sanity check

Take a moderate seat.

100 sessions a month
2,000 tokens in per session
4,000 tokens out per session

That is 200k input and 400k output tokens per seat each month. On large-model list prices, that costs about $4.25 per active seat before hosting and support. If you charge $20 per seat, your pre-infra gross margin looks healthy.

Now stress it.

Keep the same number of sessions but let heavy users push 10 times the tokens. You are near $42.50 of raw inference cost per seat. If you sell that seat for $30 on an “unlimited” plan, your gross margin goes negative. This is why many wrappers look fine at small scale and buckle under real usage.

Now optimize.

Serve 80 percent of tokens on a mini tier and reserve large models for hard paths. The same moderate seat drops to roughly $1.71 in raw token cost. Caching, truncation, and better prompt discipline move it further. The work is to bend the cost curve while keeping quality inside SLOs. The lever is routing, not vibes.

The real bottleneck

Shipping the first version is not the problem. Scaling review, reliability, and distribution is.

Review load grows faster than code volume. AI writes confidently wrong code. Teams pay the tax during PR review and on-call.
Observability must adapt to probabilistic systems. You need evals, shadow traffic, and error budgets for AI steps, not just HTTP latency.
Compliance and data handling become product features in regulated verticals. SOC 2, HIPAA, data residency, audit logs.
Platform risk is existential if you rent the moat. Vendors can change price, rate limits, or ship your feature. Your value capture compresses when that happens.

What scales past the wrapper

A wrapper can become a company. It needs compounding moats and better unit economics. Here is the path.

Workflow lock-in: Own the workflow, not the button. The product becomes the system of record for the job to be done. Think evidence, artifacts, and state. If teams run their process inside your product, switching has a cost that UI clones cannot beat. Examples: Notion (note-taking → team hub), Figma (design file → collaboration layer).
Data gravity: Capture structured exhaust that improves results tomorrow. Feedback labels, domain glossaries, retrieval corpora, first-party telemetry. If quality gets better with use and competitors cannot access that data, you get both margin expansion and defensibility.
Cost curve control: Route requests by difficulty. Cache easy paths. Use smaller or local models for the common case. Batch work. Only hit expensive models for tail problems and tool use. Your gross margin must improve with scale, not degrade. Investors will accept low early margins if the plan compounds gross profit and reduces model mix costs over time. a16z has been blunt about this. Show the path, not the promise.
Distribution that others cannot copy: Deep integrations in systems that already own the customer, or a vertical where the trust channel is hard won. If the buyer trusts you for compliance, migration, or uptime, a cheaper clone will not win. Partnering with incumbents (Salesforce, ServiceNow, Epic) who can’t build fast but can distribute.
Evaluation and SRE discipline: Treat prompts and retrieval like code. Version them. Test them. Track failure modes by class. Publish SLOs tied to business outcomes, not model accuracy alone.

Can these become unicorns

Yes, but not as thin interfaces. The path to $1B requires three compounding loops running at once.

Yes, but only with compounding moats. A pure GPT-UI wrapper will not hit $1B+ valuation—margins, churn, and platform risk collapse it first.
Verticalized or workflow-embedded apps can. If they own the workflow, capture data, and integrate deeply into enterprise systems, the wrapper becomes a SaaS-like platform with AI superpowers.
Acquisition paths exist. Even if they don’t IPO, scaled wrappers that win in niche verticals (legal AI drafting, clinical documentation, sales enablement) can be bought by incumbents (Adobe, Salesforce, Microsoft) as feature acquisitions.

A wrapper that hits those loops can be a unicorn. A wrapper that only adds shortcuts on top of a vendor API will plateau. The difference is not branding. It is whether the product captures process, data, and economics.

The history is clear. Between 2015 and 2020 thousands of low code and no code platforms launched, most died. A handful (Retool, Zapier, Webflow) scaled into $B companies by capturing workflow + distribution + stickiness. 95% will plateau or vanish; 5% may evolve into unicorns if they invest heavily in infrastructure, data moats, and workflow lock-in.

Co-Sponsor

Real Results, Real Change - Try Noom!

Say goodbye to quick fixes and hello to lasting results. Noom’s science-based program helps you change your habits for good. With personalized coaching and expert guidance, you’ll build sustainable habits that stick. Ready for real, lasting weight loss? Start your free 14 day trial today!1

Take The Quiz

What to build now

If you are a founder:

Model your session economics before launch. Tie pricing to usage. Kill unlimited tiers unless your routing is excellent.
Instrument everything. Tokens. Cache hits. Error classes. Human minutes. Publish a monthly cost-to-serve bridge.
Own a workflow. Do not become a floating sidebar. Create state, not shortcuts.
Capture data you are allowed to use. Make sure your contracts and privacy stance give you the right to improve the product with it.
Treat evals like CI. Ship guardrails and red teams, not vibes. The reasoning papers are not theory. They show how easy it is to break these systems in the wild.

The simple test

Ask one question. If OpenAI or Anthropic ships your core feature tomorrow, what is left that customers would still pay for. If the answer is distribution, workflow, data, and better economics, you have a company. If the answer is a prettier button, you have a moment.

That is the wrapper paradox in plain terms. Speed is real. So are the hidden taxes. The winners will not be the flashiest demos. They will be the teams that turn fast prototypes into reliable workflows with compounding data and improving margins. The rest will become features, and that is fine too. Not every product needs to be a unicorn. The market will value the ones that can carry weight when the screenshots fade.

Thanks for reading!

Enjoy the newsletter? Share it with a friend—it just takes a sec.

Share Open Scout

Based on a sample of 4,602 Noomers.

Open Scout