12: GGUF Attention Mechanism Explorer¶
Platform: Kaggle (2× Tesla T4)
Overview¶
This notebook visualizes attention mechanics (Q‑K‑V) across all heads and layers using llcuda + Graphistry. It complements Transformers‑Explainer by providing full‑model attention analysis on real GGUF models.
What You’ll Learn¶
- Extract attention matrices from llama.cpp
- Compare attention patterns across early vs late layers
- Visualize attention heads with Graphistry
- Compare quantization impact (Q4_K_M vs FP32)
Requirements¶
- Kaggle notebook with GPU T4 × 2
- llcuda v2.2.0
- Graphistry + RAPIDS
Quick Start¶
Open the notebook in Kaggle to run the full workflow.