Skip to content

13: GGUF Token Embedding Visualizer

Platform: Kaggle (2× Tesla T4)

Open in Kaggle


Overview

This notebook explores token embeddings by extracting vectors from GGUF models and visualizing them with GPU‑accelerated UMAP and Plotly/Graphistry.

What You’ll Learn

  • Extract embeddings via the llama.cpp embeddings endpoint
  • Reduce 3072D → 3D with RAPIDS cuML UMAP
  • Visualize semantic clustering across categories
  • Compare quantization effects on embedding geometry

Requirements

  • Kaggle notebook with GPU T4 × 2
  • llcuda v2.2.0
  • RAPIDS cuML + Plotly

Quick Start

# Install llcuda
!pip install -q --no-cache-dir git+https://github.com/llcuda/llcuda.git@v2.2.0

Open the notebook in Kaggle to run the full workflow.