LlamaCppClient API¶
OpenAI-compatible and native llama.cpp client for llama-server.
Overview¶
The LlamaCppClient provides OpenAI-compatible chat/completions plus native endpoints (complete, embeddings, tokenize).
Basic Usage¶
from llcuda.api import LlamaCppClient
client = LlamaCppClient(base_url="http://localhost:8080")
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=100
)
print(response.choices[0].message.content)
Class Reference¶
LlamaCppClient¶
class LlamaCppClient:
def __init__(self, base_url: str = "http://localhost:8080"):
"""Initialize client.
Args:
base_url: Base URL of llama-server
"""
Methods¶
chat.completions.create()¶
response = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=100,
temperature=0.7,
top_p=0.9
)
complete()¶
embeddings.create()¶
tokenize() / detokenize()¶
Examples¶
See API Examples and InferenceEngine for a simpler wrapper.