VRAM formula: (model_params × precision_bytes × 1.2) + (batch_size × sequence_length × hidden_size × layers). Quick estimates: 7B model in 16-bit = ~14GB VRAM for inference, ~20GB for training. Add optimizer states (2x model size), gradients (1x), and activations (batch-dependent). Use quantization (4-bit/8-bit) to reduce by 50-75%. Multi-GPU splits the load linearly (2 GPUs = 50% VRAM per GPU).
VRAM Estimation Formula (Training)
Components of GPU Memory Usage
- Model parameters:
params × precision_bytes(e.g., 7B × 2 bytes for FP16 = 14GB) - Gradients: Same size as model parameters (1x model size)
- Optimizer states: AdamW stores 2 states per parameter (2x model size)
- Activations:
batch_size × seq_len × hidden_dim × num_layers × 2 - Safety buffer: 20% overhead for CUDA kernels, temporary tensors
Full Training Formula
VRAM = (model_size × 4) + activations + (0.2 × total)
Where model_size × 4 = params (1x) + gradients (1x) + optimizer (2x)
Quick Reference: VRAM by Model Size
| Model Size | Inference (16-bit) | Training (16-bit) | Training (8-bit) |
|---|---|---|---|
| 1B params | 2GB | 8GB | 4GB |
| 7B params | 14GB | 56GB | 28GB |
| 13B params | 26GB | 104GB | 52GB |
| 70B params | 140GB | 560GB | 280GB |
VRAM Optimization Strategies
1. Quantization
- 4-bit (QLoRA): 75% VRAM reduction. 7B model = 3.5GB
- 8-bit: 50% reduction. 7B model = 7GB
- 16-bit (FP16/BF16): Standard precision. 7B model = 14GB
2. Gradient Checkpointing
Trades compute for memory. Recomputes activations during backward pass instead of storing them. Reduces activation memory by 80-90%, increases training time by 20-30%.
3. Reduce Batch Size
Batch size directly affects activation memory. Halving batch size from 32 to 16 can save 4-8GB VRAM. Use gradient accumulation to simulate larger batches without VRAM increase.
GPU Recommendations by Model Size
| Model | Training Method | Recommended GPU | VRAM |
|---|---|---|---|
| 7B | LoRA | RTX 4090 | 24GB |
| 7B | Full fine-tune | A100 40GB | 40GB |
| 13B | LoRA | RTX 4090 / A40 | 24-48GB |
| 70B | LoRA | A100 80GB | 80GB |
| 70B | QLoRA (4-bit) | A100 40GB | 40GB |
Use Our VRAM Calculator
Calculate exact GPU memory requirements for your model, batch size, and sequence length.
