Google Colab Notebooks¶

Complete collection of ready-to-run Jupyter notebooks for llcuda v2.1.0 on Google Colab with Tesla T4 GPU.

Overview¶

llcuda includes 8 comprehensive Google Colab notebooks covering installation, inference, fine-tuning workflows, and binary building. All notebooks are optimized for Tesla T4 GPUs and include detailed explanations, code examples, and performance metrics.

Available Notebooks¶

1. Gemma 3-1B Tutorial (Recommended)¶

File: llcuda_v2_1_0_gemma3_1b_unsloth_colab.ipynb

Complete guide for using llcuda v2.1.0 with Unsloth GGUF models on Tesla T4 GPU.

What it covers: - ✅ Install llcuda v2.1.0 from GitHub - ✅ Auto-download CUDA binaries from GitHub Releases - ✅ Load Gemma 3-1B-IT GGUF from Unsloth - ✅ Fast inference with FlashAttention (134 tok/s verified) - ✅ Batch processing and performance metrics - ✅ Advanced generation parameters - ✅ Unsloth fine-tuning → llcuda deployment workflow

Time required: ~10 minutes

Open in Colab:

View Tutorial

2. Gemma 3-1B Executed Example¶

File: llcuda_v2_1_0_gemma3_1b_unsloth_colab_executed.ipynb

Live execution output from Tesla T4 GPU showing real performance results.

What it shows: - ✅ Complete output from all cells - ✅ Verified 134 tok/s performance on Gemma 3-1B Q4_K_M - ✅ Real GPU metrics and timings - ✅ Proof of working binary download and model loading - ✅ Batch inference results (130-142 tok/s range)

Why it's useful: - See exactly what to expect on Tesla T4 - Verify performance before running - Understand output format - Debugging reference

Open in Colab:

View Executed Results

3. Build llcuda Binaries on T4¶

File: build_llcuda_v2_t4_colab.ipynb

Build CUDA 12 binaries from source on Tesla T4 GPU.

What it covers: - ✅ Clone and build llama.cpp with CUDA 12 - ✅ Enable FlashAttention and Tensor Core optimization - ✅ Compile with SM 7.5 targeting (Tesla T4) - ✅ Create binary packages for release - ✅ Download complete package (~350-400 MB)

Time required: ~15-20 minutes

When to use: - Building from source - Creating custom binary packages - Contributing to llcuda development - Understanding the build process

Open in Colab:

View Build Guide

4. Unsloth + llcuda Complete Build¶

File: llcuda_unsloth_t4_complete_build.ipynb

Complete build workflow combining llama.cpp and llcuda for Tesla T4.

What it covers: - ✅ Build llama.cpp with FlashAttention - ✅ Build llcuda Python package - ✅ Create unified tar file with everything - ✅ One-package distribution

Output: llcuda-complete-cuda12-t4.tar.gz (~350-400 MB)

Open in Colab:

5. Unsloth Tutorial¶

File: llcuda_unsloth_tutorial.ipynb

Usage guide demonstrating llcuda with Unsloth GGUF models.

What it covers: - ✅ Install llcuda (auto-downloads binaries) - ✅ Load Unsloth GGUF models - ✅ Fast inference demonstrations - ✅ Batch processing examples - ✅ Unsloth → llcuda workflow

Time required: ~5-10 minutes

Open in Colab:

6. llcuda Quickstart Tutorial¶

File: llcuda_quickstart_tutorial.ipynb

Quick introduction to llcuda basics.

What it covers: - ✅ Basic installation - ✅ Simple inference examples - ✅ Model loading methods - ✅ Performance metrics

Time required: ~5 minutes

Open in Colab:

7. Advanced Example: p3_llcuda¶

File: p3_llcuda.ipynb

Advanced usage patterns and optimization techniques.

Open in Colab:

8. Advanced Example: p3_1_llcuda¶

File: p3_1_llcuda.ipynb

Extended advanced examples with additional features.

Open in Colab:

How to Use These Notebooks¶

Running on Google Colab¶

Click "Open in Colab" button on any notebook above
Set runtime to T4 GPU:
Runtime → Change runtime type
Hardware accelerator: GPU
GPU type: T4 (if available)
Click Save
Run all cells:
Runtime → Run all
Or press Shift+Enter on each cell
Wait for completion (time varies by notebook)

Saving Your Work¶

# Save results to Google Drive
from google.colab import drive
drive.mount('/content/drive')

# Copy outputs
!cp output.txt /content/drive/MyDrive/llcuda_results/

Downloading Generated Files¶

from google.colab import files

# Download any generated file
files.download('model_output.txt')

Notebook Categories¶

For Beginners¶

✅ Gemma 3-1B Tutorial - Start here!
✅ Quickstart Tutorial - 5-minute introduction
✅ Unsloth Tutorial - Unsloth integration

For Advanced Users¶

✅ Build Binaries - Compile from source
✅ Complete Build - Full build workflow
✅ p3/p3_1 Examples - Advanced patterns

For Verification¶

✅ Gemma 3-1B Executed - See real T4 results

Common Issues¶

Issue: Runtime Disconnected¶

Solution: - Keep Colab tab active - Use Colab Pro for longer runtimes - Save checkpoints regularly

Issue: GPU Not Available¶

Solution:

# Check GPU status
!nvidia-smi

# If no GPU, change runtime:
# Runtime → Change runtime type → GPU (T4)

Issue: Out of Memory¶

Solution: - Use smaller models (Gemma 3-1B instead of 8B) - Clear runtime: Runtime → Restart runtime - Use lower quantization (Q4_K_M recommended)

Performance Expectations¶

Notebook	Download Size	Runtime	Expected Speed
Gemma 3-1B Tutorial	~916 MB	~10 min	134 tok/s
Build Binaries	~2 GB	~20 min	Build only
Quickstart	~650 MB	~5 min	Variable
Unsloth Tutorial	~650 MB	~10 min	~45 tok/s

Next Steps¶

After running the notebooks:

API Reference - Detailed API documentation
Performance Optimization - Get better performance
Unsloth Integration - Complete workflow
FAQ - Common questions

All notebooks are maintained at: github.com/waqasm86/llcuda/tree/main/notebooks