Point-in-time slices for even more savings Support for clusters for large-scale training Graphics support after CUDA workloads are rock-solid Tell us what you need—ping our team in Discord. The first $20 each month is on us, so spin up an A100 GPU and see how it feels.

Back

How Thunder Compute works (GPU-over-TCP)

Q: Why make GPUs more efficient?

GPUs are expensive and often sit idle while you read logs or tweak hyper-parameters. Thunder Compute detaches the device whenever it isn’t needed and re-attaches one within milliseconds when your code asks for it. All of this happens behind the scenes, in real time—you never wait in a queue.

Q: How does Thunder Compute work?

Network-attached – The GPU lives across a high-speed network instead of a PCIe slot. Your instance communicates with it over standard TCP. Feels local – You still pip install torch, set device="cuda", and go. Behind the scenes, your instance translates those calls into network messages. Dynamic attach – While your process runs, it has exclusive access to the full VRAM and compute of the card you pay for. When it finishes or idles out, Thunder Compute detaches the GPU to save cost.

Q: Does this affect latency?

~10–20 milliseconds for the initial connection (a blink is ~200 ms) and a small scaled runtime increase. Most ML jobs spend far more time computing than waiting for data, so smart scheduling keeps network delay from slowing your training. Check our docs for the workloads we’ve benchmarked most thoroughly.

TL;DR: We attach GPUs to your VM over a plain TCP socket instead of PCIe. This lets us optimize utilization in times when GPUs would otherwise sit idle.

Published:

Oct 31, 2024

Last updated:

Jul 1, 2025

1. Why make GPUs more efficient?

GPUs are expensive and they often sit idle while you read logs or tweak hyper‑parameters. Instead of your GPU sitting there doing nothing, it detaches from your server. When you need a GPU again, your instance transparently claims a GPU, on the order of double-digit milliseconds. This is different from a scheduler like slurm; everything happens behind the scenes, in real time, without waiting.

2. How does Thunder Compute work?

Network‑attached: The GPU sits across a high‑speed network instead of a PCIe slot. Your virtual machine communicates with the GPU over TCP—the same protocol your browser uses.
Feels local: You still pip install torch, use device="cuda", and go. Behind the scenes, our instance translates those calls into network messages.
Time‑sliced: When your process runs, it owns the whole GPU. You have access to the full VRAM and compute of the card you pay for. When the process finishes (or you idle out), we can pass that GPU to someone else.

3. Does this affect latency?

~10-20 milliseconds for the initial connection (blinking is ~200 milliseconds), and a scaled runtime increase. Fortunately, most ML jobs spend far more time computing than waiting for data. By strategically optimizing the way your program runs behind the scenes, we can prevent network latency from affecting your GPU computation. Check our docs to see what we've most thoroughly tested.

4. Is Thunder Compute secure?

When your job ends, we wipe every byte of GPU memory and reset the card so no data leaks to the next user. Each process runs in its own sandbox.

Tell us what you need—ping our team in Discord. Spin up an A100 GPU and see how it feels.

Carl Peterson

Try Thunder Compute

Start building AI/ML with the world's cheapest GPUs

How Thunder Compute works (GPU-over-TCP)

1. Why make GPUs more efficient?

2. How does Thunder Compute work?

3. Does this affect latency?

4. Is Thunder Compute secure?

Other articles you might like