Frequently Asked Questions¶
Common questions about llcuda v2.2.0 (Kaggle dual T4).
General¶
What is llcuda?¶
llcuda is a CUDA 12-first inference backend designed for Kaggle dual Tesla T4. It ships a lightweight Python package and auto-downloads the CUDA binaries on first import.
Why Kaggle dual T4 only?¶
llcuda v2.2.0 is optimized for SM 7.5 (Tesla T4) and the split‑GPU workflow (GPU 0: LLM, GPU 1: Graphistry/RAPIDS). Other environments are not supported for this release.
How does llcuda compare to other solutions?¶
llcuda focuses on fast setup + optimized inference for 1B–5B GGUF models on Kaggle dual T4, with built‑in visualization workflows.
Installation¶
How do I install llcuda?¶
Can I install from PyPI?¶
No. v2.2.0 is GitHub‑only (with optional HuggingFace mirror).
Why do binaries download on first import?¶
To keep the Python package small. The ~961 MB CUDA bundle downloads once and caches in ~/.cache/llcuda.
Compatibility¶
Which GPUs are supported?¶
Tesla T4 (SM 7.5), Kaggle dual‑GPU only.
What Python versions are supported?¶
Python 3.11+.
What CUDA versions are supported?¶
CUDA 12.x (Kaggle runtime).
Models¶
Which models can I use?¶
Small GGUF models (1B–5B), Q4_K_M recommended.