Back
Virtualization in Cloud Computing: the Past, the Present, and the Future
A short history of virtualization and a look at the future of this technology
Published:
Sep 3, 2024
|
Last updated:
Apr 17, 2025

TL;DR: Virtualization lets one piece of hardware masquerade as many. It started with CPUs, moved to disks and memory, and is now transforming how we use GPUs. This post walks through what virtualization is, why it matters, and where GPU virtualization stands today.
1. What is virtualization?
Virtualization is software that creates an abstraction layer—a virtual machine (VM)—that looks and behaves just like real hardware. Your program thinks it's running on its own CPU, disk, or GPU, but in reality the hypervisor is sharing the underlying device across many users.
2. Why bother?
Higher utilization: Servers often sit idle. Virtualization lets 5–10× more work run on the same hardware.
Elastic capacity: VMs can be moved, resized, or paused in seconds—no racking or cabling required.
Isolation: Faults and security issues stay inside the VM bubble.
The price you pay is overhead. Extra layers add latency and sometimes cap throughput, but history shows that margin shrinks over time.
3. A (very) short history
Era | Milestone | Why it mattered |
---|---|---|
1960s | IBM CP‑40 time‑shared a mainframe between 14 users. | Turned million‑dollar hardware into a shared resource. |
1990s | VMware noticed servers idled ≈85 % and revived virtualization on x86. | Drove utilization toward 80 %+ and made “one server per app” obsolete. |
2000s | Amazon EC2 shipped virtual CPUs (vCPUs) by default. | Popularized “pay only for what you use” cloud pricing. |
4. Beyond the CPU
Storage: AWS Elastic Block Store (EBS) pools thousands of disks into on‑demand volumes with near‑local performance.
Memory: Projects like vNUMA carve RAM across hosts, but nanosecond latencies make high‑performance memory virtualization tough.
5. GPUs: today’s frontier
GPUs crave bandwidth and hate context switches, so early experiments were ~100 × slower than bare metal. Progress is quick:
Year | Project | Overhead vs. Physical GPU |
2013 | rCUDA (research) | ~100 × (with RDMA) |
2022 | Thunder Compute prototype | ~1000 × (with TCP) |
2025 | Thunder Compute public beta | ~1.5 × and falling |
Breakthroughs driving this drop:
Networking breakthroughs improve the speed that GPUs can communicate over a network connection
AI-enabled optimization strategically modifies the way a GPU program executes to accommodate high-latency connections
Idle‑time disconnection allows many developers to share a smaller pool of hardware
6. Where this is heading
Cheaper prototyping: Idle time can now be repurposed and rented to others. Cheaper bills and no more capacity shortages.
Simpler infrastructure: Managing GPUs is tough. Network connections provide a layer of flexibility that can easily swap one chip for another in case of a failure.
Effortless scaling: When your app needs dedicated GPUs, migrate without rewriting infrastructure.
Virtualization already made CPUs and disks feel “elastic.” GPUs are next—bringing the same flexibility to model training, game servers, and any workload that spikes.

Carl Peterson
Other articles you might like
Learn more about how Thunder Compute will virtualize all GPUs
Try Thunder Compute
Start building AI/ML with the world's cheapest GPUs