12: GGUF Attention Mechanism Explorer¶

Platform: Kaggle (2× Tesla T4)

Overview¶

This notebook visualizes attention mechanics (Q‑K‑V) across all heads and layers using llcuda + Graphistry. It complements Transformers‑Explainer by providing full‑model attention analysis on real GGUF models.

What You’ll Learn¶

Extract attention matrices from llama.cpp
Compare attention patterns across early vs late layers
Visualize attention heads with Graphistry
Compare quantization impact (Q4_K_M vs FP32)

Requirements¶

Kaggle notebook with GPU T4 × 2
llcuda v2.2.0
Graphistry + RAPIDS

Quick Start¶

# Install llcuda
!pip install -q --no-cache-dir git+https://github.com/llcuda/llcuda.git@v2.2.0

Open the notebook in Kaggle to run the full workflow.