Back
Fine Tune Llama 3 or Llama 4 on a Single A100: Cost, Time, and QLoRA Code
Fine‑tune Llama 3 or the newly‑released Llama 4 Scout on a single Thunder Compute A100, with exact commands, runtimes, and cost maths.
Published:
Apr 19, 2025
|
Last updated:
Apr 19, 2025

1) Prerequisites
What | Why |
---|---|
Thunder Compute account (includes $20 monthly credit) | Covers ~35 min on an A100 80 GB |
VS Code + Thunder Compute extension | One‑click instance creation & remote workspace |
Python 3.10 + Conda | Env bootstrap |
Follow the official Quick Start guide to install the VSCode extension
2) Create an A100 80GB instance (≈ $0.78 hr) Thunder Compute
In the Console: click New Instance → A100 80 GB.
In VS Code: click the + icon at the top of the Thunder Compute tab and pick A100 80 GB.
Make sure to set storage to 300GB or more for Llama4
3) Connect from VS Code
Command Palette → Thunder Compute: Connect (or click the “⇄” icon beside the instance).
VS Code reloads; the integrated terminal is now running on the GPU box—no extra Remote‑SSH extension required.
4) Set up the Python environment
Authenticate huggingface
For the Llama family of models, you will need to use this form to request access permissions. Approval time varies but generally takes <5 minutes.
5) Minimal QLoRA training script
Create train_llama_qLoRA.py
:
Run:
6) Expected runtime & VRAM
Model | Steps (≈ 1 epoch on 2 % of dataset) | Time on A100 80 GB | Peak VRAM |
---|---|---|---|
Llama 3‑8B (4‑bit) | ~1500 | ~2 h | 42 GB |
Llama 4 Scout 17B (4‑bit) | ~1500 | ~2 h | ~79 GB Hugging Face |
Need Maverick or larger? Spin up 2 – 4 × A100s from the same dialog and let
torchrun --nproc_per_node {N}
handle model parallelism. Scout with 4-bit quantization will still run happily on a single card.
7) Track spend & shut down
You can track spend and manage instances through the console.
8) Next steps
Swap in your own dataset.
Increase
num_train_epochs
until loss plateaus.If VRAM allows, switch
load_in_4bit=False
for 8‑bit precision.
FAQ
Why QLoRA? It trains low‑rank adapters while keeping base weights in 4‑bit, letting 8 - 70 B models fit on a single A100.
What about Llama 4 Maverick? The 128‑expert variant needs ~300 GB VRAM (4 × A100 80 GB in INT4) ApX Machine Learning.
Commands and script tested on a fresh Thunder Compute A100 80 GB instance (Ubuntu 22.04).

Carl Peterson
Other articles you might like
Learn more about how Thunder Compute will virtualize all GPUs
Try Thunder Compute
Start building AI/ML with the world's cheapest GPUs