VRAM formula: (model_params × precision_bytes × 1.2) + (batch_size × sequence_length × hidden_size × layers). Quick estimates: 7B model in 16-bit = ~14GB VRAM for inference, ~20GB for training. Add optimizer states (2x model size), gradients (1x), and activations (batch-dependent). Use quantization (4-bit/8-bit) to reduce by 50-75%. Multi-GPU splits the load linearly (2 GPUs = 50% VRAM per GPU).

VRAM Estimation Formula (Training)

Components of GPU Memory Usage

  • Model parameters: params × precision_bytes (e.g., 7B × 2 bytes for FP16 = 14GB)
  • Gradients: Same size as model parameters (1x model size)
  • Optimizer states: AdamW stores 2 states per parameter (2x model size)
  • Activations: batch_size × seq_len × hidden_dim × num_layers × 2
  • Safety buffer: 20% overhead for CUDA kernels, temporary tensors

Full Training Formula

VRAM = (model_size × 4) + activations + (0.2 × total)

Where model_size × 4 = params (1x) + gradients (1x) + optimizer (2x)

Quick Reference: VRAM by Model Size

Model SizeInference (16-bit)Training (16-bit)Training (8-bit)
1B params2GB8GB4GB
7B params14GB56GB28GB
13B params26GB104GB52GB
70B params140GB560GB280GB

VRAM Optimization Strategies

1. Quantization

  • 4-bit (QLoRA): 75% VRAM reduction. 7B model = 3.5GB
  • 8-bit: 50% reduction. 7B model = 7GB
  • 16-bit (FP16/BF16): Standard precision. 7B model = 14GB

2. Gradient Checkpointing

Trades compute for memory. Recomputes activations during backward pass instead of storing them. Reduces activation memory by 80-90%, increases training time by 20-30%.

3. Reduce Batch Size

Batch size directly affects activation memory. Halving batch size from 32 to 16 can save 4-8GB VRAM. Use gradient accumulation to simulate larger batches without VRAM increase.

GPU Recommendations by Model Size

ModelTraining MethodRecommended GPUVRAM
7BLoRARTX 409024GB
7BFull fine-tuneA100 40GB40GB
13BLoRARTX 4090 / A4024-48GB
70BLoRAA100 80GB80GB
70BQLoRA (4-bit)A100 40GB40GB

Use Our VRAM Calculator

Calculate exact GPU memory requirements for your model, batch size, and sequence length.

Launch Calculator →