Why Every GPU will be Virtually Attached over a Network

August 31, 2024

Introducing GPU virtualization

Virtualization is a concept in computer science for creating virtual representations of physical hardware. While virtualization is commonly associated with Virtual Machines (VMs), it extends to other domains, including GPUs. GPU virtualization is essential for efficient resource sharing in high-performance computing, AI, and machine learning. However, it’s often misunderstood, especially when applied to GPUs, where the term can have multiple meanings.

Existing types of GPU Virtualization

GPU virtualization currently exists in three main forms:

Single-node GPU sharing
Dedicated GPU passthrough
Network-based GPU pooling (Thunder Compute's approach)

The first two operate within a single physical server and are widely used today. Thunder Compute is pioneering the third approach, which operates across multiple servers or 'nodes'."

Single-node GPU sharing (e.g., NVIDIA vGPU)

Divides a physical GPU into multiple virtual GPUs. This allows several virtual machines (VMs) to simultaneously use portions of the same GPU, improving resource utilization in scenarios where VMs don't need the full power of a GPU.

Dedicated GPU passthrough (e.g., Intel GVT-d)

Assigns an entire physical GPU to a single VM. While this doesn't split the GPU, it's considered virtualization because it allows a VM to directly access the GPU, providing near-native performance for applications that require the full power of a GPU.

The third approach, network-based GPU pooling, is a newer concept that requires deeper explanation.

A new approach: Network-Based Virtualization

At its core, Thunder Compute is a network-based GPU virtualization solution. This works by extending physical PCIe connections with virtual connections over a network.

In practice, this means that any computer can access any GPU across a network. Traditionally, adding a GPU to a server requires physically connecting it to the motherboard. With Thunder Compute, a virtual GPU can be "plugged in" via software, behaving just like a physically connected GPU.

Thunder Compute's solution acts as a bridge between the application and the GPU. It replaces the standard GPU software interface (like NVIDIA CUDA) with a network-aware version. This allows applications to interact with GPUs on remote servers as if they were locally attached.

The end result is that a computer without a physical GPU can behave exactly as if it has a GPU, without any hardware changes. This creates a flexible, distributed GPU resource pool that can be dynamically allocated and shared across the network.

Why Network-Distributed GPU Virtualization is a Game-Changer

Traditional GPU virtualization is limited by physical hardware constraints, typically supporting a maximum of 8 GPUs per server. Expanding GPU capacity requires vertical scaling, which involves upgrading individual servers. However, this method often leads to inefficient resource utilization as VMs tend to reserve entire GPUs.

Thunder Compute's network-distributed approach overcomes these limitations by enabling GPUs to be accessed across multiple servers (also called 'nodes') in a data center. This creates a data center-wide pool of GPU resources, rather than limiting each server to its own physically attached GPUs.

This ability to expand GPU resources by adding more servers (known as horizontal scalability) allows for flexible, on-demand allocation of GPU power. It dramatically increases efficiency by ensuring GPUs are used to their full capacity across the entire data center.

Comparing Thunder Compute to Similar Technologies

To conceptualize Thunder Compute's network virtualization, it is helpful to look at some existing solutions for attaching GPUs and other hardware across networks:

NVIDIA InfiniBand: This is a high-speed networking technology that allows for faster communication between servers in a data center. While it improves the connection speed for GPU systems spread across multiple servers, it doesn't address the core issue of efficiently allocating GPU resources among different applications or users.
Storage Area Networks (SANs): SANs pool storage devices across a network, allowing VMs to access only the storage they need without reserving excess capacity. Thunder Compute's GPU virtualization operates on a similar principle, enabling precise GPU resource allocation with minimal idle time.

The Future of GPU Virtualization

As with other virtualization technologies, network-based GPU virtualization faces performance challenges but continues to improve. Thunder Compute's early tests showed AI inference tasks running 100 times slower than on attached hardware. Within a month, performance improved to ~2 times slower for most AI workloads.

This rapid progress points to a future where network-virtualized GPUs will match the performance of physically attached GPUs. As the technology matures, applications will extend beyond data centers to slower networks, including connections between data centers and even home networks. We envision a future where developers can access vast GPU resources from their laptops over standard WiFi connections.

The advantages of network-based GPU virtualization—flexibility, efficiency, and scalability—position it as the likely future standard for GPU management in data centers and clouds. Try Thunder Compute to experience this technology firsthand.