First Steps¶
Your first steps with llcuda v2.2.0 on Kaggle.
1. Load a Model¶
from llcuda.server import ServerManager, ServerConfig
# Basic configuration
config = ServerConfig(
model_path="/path/to/model.gguf",
n_gpu_layers=99, # Offload all to GPU
)
server = ServerManager()
server.start_with_config(config)
2. Make Your First Request¶
from llcuda.api import LlamaCppClient
client = LlamaCppClient()
response = client.chat.completions.create(
messages=[
{"role": "user", "content": "What is machine learning?"}
],
max_tokens=200
)
print(response.choices[0].message.content)
3. Explore Notebooks¶
Try the tutorial notebooks: - 01 - Quick Start - 02 - Server Setup - 03 - Multi-GPU