Making GPUs abundant

The history of computing is the history of making scarce resources feel abundant.

Mainframes became time-shared. Servers became virtual machines. Storage moved onto the network. Compute, storage, memory, and networking are all virtualized.

But GPUs are not.

Today, GPU infrastructure is built around bare metal servers and multi-year contracts.

So far, the best answers to GPU efficiency have been scheduling-based. They help in some cases, but do not generalize; they require conscious effort from developers to design workloads around opinionated frameworks.

The ideal solution to GPU efficiency works with all workloads in all data centers. It adapts to cluster-scale training, one-off experiments, production inference, and everything in between.

This type of solution can only exist at the systems layer: virtualization.

Thunder Compute is the result of four years of work toward generalized GPU virtualization. We are a team of low-latency systems engineers from Citadel Securities, Aquatic, Old Mission, and AWS. We think in nanoseconds because that is what this problem requires.

Modern software keeps moving up the abstraction stack. We go the other direction: closer to the hardware. Our engineers optimize every layer of the stack and leave no cycles wasted.

The biggest constraint in AI is GPU capacity. Capacity constraints are nothing new; the best systems researchers have solved them before, and our bet is simple: they will solve this one too.

Today, we use virtualization in our own cloud to create more capacity within our data centers, passing the savings on as lower prices. You can use Thunder Compute directly through our cloud, through partners already running our software, or in your own infrastructure.

GPUs aren't virtualized.
We're changing that.

We make GPUs abundant

Low prices, developer-first features, simple UX. Start building today.