Back

How Thunder Compute works (GPU-over-TCP)

Thunder Compute uses API virtualization to reduce cost while developing AI/ML

Oct 29, 2024

Thunder Compute uses network-attached GPUs instead of physically-attached GPUs. Behind the scenes, Thunder Compute tricks CPU-only instances into thinking that they have GPUs attached. These GPUs are network-attached over TCP. From your perspective, the resulting instances behave like they have GPUs without requiring that a GPU is physically connected.

As a result, all instances on Thunder Compute are on-demand CPU-only instances, exactly like you would find on AWS, GCP, or Azure. These instances do not have GPUs. Logically, it follows that the CPU-only instances you interact with on Thunder Compute have all of the functionality of EC2 instances that you would find on Amazon or Google Cloud. In fact, many of them are hosted on Amazon or Google Cloud.

Here is a rough diagram of how we manage these connections between CPU-only instances and GPUs behind the scenes

Now that you understand the distinction between a Thunder Compute instance and a GPU instance on EC2, it is worth explaining why we use virtualization. Primarily, virtualization allows us to serve more customers with fewer GPUs. This lets us pass through (much) lower pricing to you than you see anywhere else for comparable hardware. There are a few limitations of this virtualized approach, which we are focused on improving over time:

1. **Performance**: TCP is slower than PCIe. While this may seem problematic, Thunder Compute is optimized to minimize the resulting performance impact. The real-world slowdown often is not noticeable and minimally impacts common data science tasks.
2. **Limited Compatibility**: Eventually, our GPUs-over-TCP will have the full functionality of physically attached cards, but today, Thunder Compute lacks official support for some common GPU libraries. If Thunder Compute does not support your particular use case, please reach out and we will add support.

The impact of these drawbacks will vary depending on your specific workload, and we continue to improve both over time. Until now, our testing has shown data science workflows to be the most performant and stable. You can find full compatibility details in “Compatibility.” Thunder Compute is open to the public, so the easiest way to test compatibility with your workflow is to try it yourself.

Carl Peterson