Skip to content

12: GGUF Attention Mechanism Explorer

Platform: Kaggle (2× Tesla T4)

Open in Kaggle


Overview

This notebook visualizes attention mechanics (Q‑K‑V) across all heads and layers using llcuda + Graphistry. It complements Transformers‑Explainer by providing full‑model attention analysis on real GGUF models.

What You’ll Learn

  • Extract attention matrices from llama.cpp
  • Compare attention patterns across early vs late layers
  • Visualize attention heads with Graphistry
  • Compare quantization impact (Q4_K_M vs FP32)

Requirements

  • Kaggle notebook with GPU T4 × 2
  • llcuda v2.2.0
  • Graphistry + RAPIDS

Quick Start

# Install llcuda
!pip install -q --no-cache-dir git+https://github.com/llcuda/llcuda.git@v2.2.0

Open the notebook in Kaggle to run the full workflow.