GPU Cloud: No One's Safe

Can't imagine how much I'd be worth if I'd dropped out...

PODCAST - THE LOST EPISODE

Benchmarking the GPU Cloud Market

Vibe coding, MechaHitler, Coldplay gigs. Things go wrong in AI.

Worse than all those were the technical issues we had recording a podcast with Dylan Patel from SemiAnalysis.

We couldn’t use the recording, but the chat was too good to let it disappear. So instead, here’s a recap from memory (and rough notes), of what Dylan had to say on GPU infra, the mess of scaling AI, and what’s coming next.

Cloud, but make it GPU

“There’s 120+ GPU infra companies right now. I don’t think more than 20 will survive.”

The old vs new cloud framing came up a few times. Old cloud is all SaaS layers and margin. New cloud’s closer to the metal - built around GPU cost, scarcity, and speed to deploy. It’s less ‘buy a managed service’ and more ‘buy raw horsepower plus a team that keeps it from falling over.’

Some of the better providers now sell themselves on operational maturity. If they can handle failures, rollout updates, rate-limit smoothly across a big cluster - that’s the product.

Margins are disappearing

“In 2023, GPU rentals were going for $5/hour with $1.50 of actual cost. Those margins were insane. Now they’ve come right down.”

The arbitrage era’s done. Prices are falling, utilisation’s climbing, and providers can’t coast on markup anymore. It’s not a race to the bottom on price - but it is a race to efficiency.

The ones that are scaling now are doing it by:

Getting customers live faster
Making the infra more reliable
Actually delivering the promised performance

Bad infra used to survive on hype. Now even 20% margin might not be enough to keep the lights on.

Infra vs Solution

“Enterprises don’t want infra. They want working AI. But you can’t do one without the other.”

This bit got pretty spicy. Dylan’s take: too many companies try to avoid infra complexity, build something clever on top of someone else’s stack… and then get buried by their own API bills.

He told the story of one agent startup that got to a billion tokens/day two weeks post-launch. It worked. Too well. They had to scramble for custom infra to stay alive.

There’s also the constant fear that OpenAI drops a new API and suddenly your whole product is redundant. If you’re just calling the same foundation model as everyone else, where’s your edge?

Ranking the GPU clouds

“We’re not sure how we’ll make money off it yet. But everyone will know us!”

ClusterMax started as diligence for an acquisition. Now it’s SemiAnalysis testing 60+ GPU clouds, benchmarking them on performance, reliability, and usability. Dylan described it as “trying to be the Moody’s of GPU cloud,” but with no kickbacks.

Interestingly, some of the cheapest clouds performed the worst. CoreWeave scored well - not because they’re cheap, but because they’re consistent. Together got props for their kernel libraries that help squeeze extra performance out of your models.

Eventually, Dylan wants the scores to factor into financing: rank Platinum, get better loan terms. It’s early, but makes sense - infra quality is a financial risk.

Nobody’s safe

“No one has a right to win.”

That line came up a few times. Not the hyperscalers, not the tooling startups, not the new clouds. Everyone’s living on uptime, utilisation, and whether they can stay just ahead of the next thing.

Some of these providers raised debt at 17% interest to buy GPUs. As Dylan put it: “The GPU’s down for a day - am I making money, or just paying interest?”

Infra’s no longer just technical. It’s financial. And it’s brutal.

HIDDEN GEMS

This Is A Call // Gem // Song

Qdrant’s Vector Space Day 2025, a Berlin-based conference on September 26 bringing together engineers and researchers working on vector search, RAG infrastructure, and agentic AI, with talks, workshops, and a call for speakers.

Slow Motion // Gem // Song

A METR blog post detailing a trial that found early‑2025 GenAI tools slowed experienced open‑source developers by ~19% on familiar codebases, with a LinkedIn post giving nuance to the findings.

Reflections // Gem // Song

A personal blog post from ex-OpenAI engineer Calvin French-Owen reflecting on his year at the company, offering insights into team dynamics, internal tooling, and the fast-paced development behind projects like Codex.

Crate Diggin // Gem // Song

A Rust crate on crates.io named cuda-rust-wasm, offering tools to combine CUDA GPU computing, Rust, and WebAssembly for high-performance browser or WASM-embedded GPU workloads

PODCAST

AI Agent Development Tradeoffs You NEED to Know

Not many people get to interview the person building the AI that’s coming for their old job.

Many moons ago, before the Community, I was an SDR. Now 11x have built two AI agents, Alice and Julian, to replace that role. Alice handles everything from lead sourcing to multi-step email outreach. Sherwood explained why they chose LangGraph for its structured control, hosting the agents on LangGraph Cloud and monitoring performance via LangSmith and Arise.

Alice’s workflow leans heavily on structure, especially in outreach:

Campaign flow as a graph: Research, sequencing, and message generation are all modular, improving reliability and customization
Eval sampling: About 1% of outputs are checked for hallucinations by comparing against internal research

Help me keep this job and click below to listen.

Video || Spotify || Apple

MEME OF THE WEEK

BLOG

To Vibe Code or Not to Vibe Code?

Things move quickly in AI, but one trend that’s hanging around longer than a Coldplay gig meme is vibe coding.

It’s moving away from toy apps and creeping into real workflows, especially for quick iterations and prototyping. This blog looks at tools like Cursor and Bolt that let you describe what you want in plain language, accept AI suggestions, and ship something functional with minimal manual coding. It’s fast, sometimes chaotic, and very different from traditional development.

For prototyping, two patterns stood out:

Cursor with SDKs: Useful for sketching agent logic, testing prompts, and letting AI fill in boilerplate Python.
Bolt for UI: Entire sites built via natural language prompts, no design tools needed.

Read the blog, then order the t-shirt so you look cool on camera for the meme.

Read it here

ML CONFESSIONS

Used to work with this bloke who always had to be the best in the room. If you’d read a paper, he’d read the appendix. You’ve been to Timbuktu, he’s been to Timbukthree.

One week the model starts throwing out nonsense. He’s straight in, convinced it’s some subtle gradient issue, starts hacking away at the training loop like he’s reinventing PyTorch.

Meanwhile, someone else clocked the real problem - a feature pipeline had stalled and was just feeding in NaNs. Quick fix, job done.

Worth the outage just to see him pipe down for a bit.

Share your confession here.

HOT TAKE

Old cloud optimized for cost and convenience. New cloud’s optimizing for “did the kernel panic at 2am?” Feels less like a platform war, more like parallel universes.

You old-school or new-school?