Back

Virtualization in Cloud Computing: the Past, the Present, and the Future

A short history of virtualization and a look at the future of this technology

Published:

Sep 3, 2024

|

Last updated:

Apr 17, 2025

TL;DR: Virtualization lets one piece of hardware masquerade as many. It started with CPUs, moved to disks and memory, and is now transforming how we use GPUs. This post walks through what virtualization is, why it matters, and where GPU virtualization stands today.

1. What is virtualization?

Virtualization is software that creates an abstraction layer—a virtual machine (VM)—that looks and behaves just like real hardware. Your program thinks it's running on its own CPU, disk, or GPU, but in reality the hypervisor is sharing the underlying device across many users.

2. Why bother?

  • Higher utilization: Servers often sit idle. Virtualization lets 5–10× more work run on the same hardware.

  • Elastic capacity: VMs can be moved, resized, or paused in seconds—no racking or cabling required.

  • Isolation: Faults and security issues stay inside the VM bubble.

The price you pay is overhead. Extra layers add latency and sometimes cap throughput, but history shows that margin shrinks over time.

3. A (very) short history

Era
Milestone
Why it mattered

1960s

IBM CP‑40 time‑shared a mainframe between 14 users.

Turned million‑dollar hardware into a shared resource.

1990s

VMware noticed servers idled ≈85 % and revived virtualization on x86.

Drove utilization toward 80 %+ and made “one server per app” obsolete.

2000s

Amazon EC2 shipped virtual CPUs (vCPUs) by default.

Popularized “pay only for what you use” cloud pricing.

4. Beyond the CPU

  • Storage: AWS Elastic Block Store (EBS) pools thousands of disks into on‑demand volumes with near‑local performance.

  • Memory: Projects like vNUMA carve RAM across hosts, but nanosecond latencies make high‑performance memory virtualization tough.

5. GPUs: today’s frontier

GPUs crave bandwidth and hate context switches, so early experiments were ~100 × slower than bare metal. Progress is quick:

Year
Project
Overhead vs. Physical GPU

2013

rCUDA (research)

~100 × (with RDMA)

2022

Thunder Compute prototype

~1000 × (with TCP)

2025

Thunder Compute public beta

~1.5 × and falling

Breakthroughs driving this drop:

  • Networking breakthroughs improve the speed that GPUs can communicate over a network connection

  • AI-enabled optimization strategically modifies the way a GPU program executes to accommodate high-latency connections

  • Idle‑time disconnection allows many developers to share a smaller pool of hardware

6. Where this is heading

  • Cheaper prototyping: Idle time can now be repurposed and rented to others. Cheaper bills and no more capacity shortages.

  • Simpler infrastructure: Managing GPUs is tough. Network connections provide a layer of flexibility that can easily swap one chip for another in case of a failure.

  • Effortless scaling: When your app needs dedicated GPUs, migrate without rewriting infrastructure.

Virtualization already made CPUs and disks feel “elastic.” GPUs are next—bringing the same flexibility to model training, game servers, and any workload that spikes.

Carl Peterson

Try Thunder Compute

Start building AI/ML with the world's cheapest GPUs