13: GGUF Token Embedding Visualizer¶

Platform: Kaggle (2× Tesla T4)

Overview¶

This notebook explores token embeddings by extracting vectors from GGUF models and visualizing them with GPU‑accelerated UMAP and Plotly/Graphistry.

What You’ll Learn¶

Extract embeddings via the llama.cpp embeddings endpoint
Reduce 3072D → 3D with RAPIDS cuML UMAP
Visualize semantic clustering across categories
Compare quantization effects on embedding geometry

Requirements¶

Kaggle notebook with GPU T4 × 2
llcuda v2.2.0
RAPIDS cuML + Plotly

Quick Start¶

# Install llcuda
!pip install -q --no-cache-dir git+https://github.com/llcuda/llcuda.git@v2.2.0

Open the notebook in Kaggle to run the full workflow.