Skip to content

Frequently Asked Questions

Common questions about llcuda v2.2.0 (Kaggle dual T4).

General

What is llcuda?

llcuda is a CUDA 12-first inference backend designed for Kaggle dual Tesla T4. It ships a lightweight Python package and auto-downloads the CUDA binaries on first import.

Why Kaggle dual T4 only?

llcuda v2.2.0 is optimized for SM 7.5 (Tesla T4) and the split‑GPU workflow (GPU 0: LLM, GPU 1: Graphistry/RAPIDS). Other environments are not supported for this release.

How does llcuda compare to other solutions?

llcuda focuses on fast setup + optimized inference for 1B–5B GGUF models on Kaggle dual T4, with built‑in visualization workflows.

Installation

How do I install llcuda?

pip install git+https://github.com/llcuda/llcuda.git@v2.2.0

Can I install from PyPI?

No. v2.2.0 is GitHub‑only (with optional HuggingFace mirror).

Why do binaries download on first import?

To keep the Python package small. The ~961 MB CUDA bundle downloads once and caches in ~/.cache/llcuda.

Compatibility

Which GPUs are supported?

Tesla T4 (SM 7.5), Kaggle dual‑GPU only.

What Python versions are supported?

Python 3.11+.

What CUDA versions are supported?

CUDA 12.x (Kaggle runtime).

Models

Which models can I use?

Small GGUF models (1B–5B), Q4_K_M recommended.

How do I load a model from HuggingFace?

engine.load_model(
    "unsloth/gemma-3-1b-it-GGUF:gemma-3-1b-it-Q4_K_M.gguf"
)