Skip to content

K-Quants Guide

Understanding K-quantization formats.

What are K-Quants?

K-quants use "double quantization": 1. Quantize weights 2. Quantize the quantization parameters

Result: Better quality-to-size ratio

Available K-Quants

  • Bits per weight: 4.8
  • Size: ~40% of FP16
  • Quality: Excellent for most uses
  • Use case: Default choice

Q5_K_M

  • Bits per weight: 5.7
  • Size: ~48% of FP16
  • Quality: Higher than Q4_K_M
  • Use case: Quality-sensitive

Q6_K

  • Bits per weight: 6.6
  • Size: ~55% of FP16
  • Quality: Near FP16
  • Use case: Maximum quality

Q8_0

  • Bits per weight: 8.5
  • Size: ~70% of FP16
  • Quality: Virtually lossless
  • Use case: Research, validation

Size Comparison

Model FP16 Q4_K_M Q5_K_M Q6_K Q8_0
1B 2GB 0.8GB 1GB 1.1GB 1.4GB
7B 14GB 5.6GB 6.7GB 7.7GB 9.8GB
13B 26GB 10.4GB 12.5GB 14.3GB 18.2GB
70B 140GB 56GB 67GB 77GB 98GB