Back

How Thunder Compute works (GPU-over-TCP)

TL;DR: We attach GPUs to your VM over a plain TCP socket instead of PCIe. This lets us time‑slice each physical GPU across many users without asking you to change a single line of code.

Published:

Oct 31, 2024

|

Last updated:

Apr 17, 2025

1. Why bother virtualizing a GPU?

GPUs are expensive and they often sit idle while you read logs or tweak hyper‑parameters. By stacking workloads from multiple users back to back on each GPU, we keep the hardware busy and the price low. This is different from a scheduler like slurm; everything happens behind the scenes, in real time, without waiting.

2. How does it work?

  • Network‑attached: The GPU sits across a high‑speed network instead of a PCIe slot. Your virtual machine communicates with the GPU over TCP—the same protocol your browser uses.

  • Feels local: You still pip install torch, use device="cuda", and go. Behind the scenes, our instance translates those calls into network messages.

  • Time‑sliced: When your process runs, it owns the whole GPU. You have access to the full VRAM and compute of the card you pay for. When the process finishes (or you idle out), we can pass that GPU to someone else.

3. Performance

TCP adds a bit of delay, but most ML jobs spend far more time computing than waiting for data. By strategically optimizing the way your program runs behind the scenes, we usually land within 1×–1.8× of a direct‑attach GPU. Less optimized tasks may be much worse, but sometimes this can even be faster than native; check our docs to see what we've most thoroughly tested.

Feature
Status

Network-attached GPUs

Ready

Secure process isolation

Ready

Multi-GPU support

Ready

PyTorch

Ready

TensorFlow, JAX

Early access

Multi-node clusters

In development

Graphics / game engines

Not yet

4. Security

When your job ends, we wipe every byte of GPU memory and reset the card so no data leaks to the next user. Each process runs in its own sandbox.

5. What’s next

  • Point‑in‑time slices for even more savings

  • Support for clusters for large-scale training

  • Graphics support after CUDA workloads are rock‑solid

Tell us what you need—ping our team in Discord. The first $20 each month is free, so spin up an A100 GPU and see how it feels.

Carl Peterson

Try Thunder Compute

Start building AI/ML with the world's cheapest GPUs