OpenAI API Client¶
Use OpenAI SDK with llama-server for drop-in compatibility.
Level: Advanced | Time: 15 minutes | VRAM Required: 5-10 GB
Setup¶
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
Chat Completions¶
response = client.chat.completions.create(
model="local-model",
messages=[
{"role": "user", "content": "Hello!"}
],
max_tokens=100
)
print(response.choices[0].message.content)
Streaming¶
stream = client.chat.completions.create(
model="local-model",
messages=[{"role": "user", "content": "Tell me a story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")