Skip to content

Performance Benchmarks

Real-world performance on Kaggle dual T4.

Single GPU Results

Model Quant GPU Tokens/sec VRAM
Gemma 3-1B Q4_K_M 1× T4 ~45 tok/s 3 GB
Qwen2.5-1.5B Q4_K_M 1× T4 ~50 tok/s 2.5 GB
Llama-3.2-3B Q4_K_M 1× T4 ~30 tok/s 4 GB

Dual GPU Results

Model Quant GPUs Tokens/sec VRAM
Gemma 2-2B Q4_K_M 2× T4 ~60 tok/s 4 GB
Qwen2.5-7B Q4_K_M 2× T4 ~35 tok/s 10 GB
Llama-70B IQ3_XS 2× T4 ~12 tok/s 27 GB

Optimization Impact

Optimization Speedup
FlashAttention 2-3x
Tensor Cores 1.5x
CUDA Graphs 1.2x