Skip to content

OpenAI API Client

Use OpenAI SDK with llama-server for drop-in compatibility.

Level: Advanced | Time: 15 minutes | VRAM Required: 5-10 GB


Setup

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)

Chat Completions

response = client.chat.completions.create(
    model="local-model",
    messages=[
        {"role": "user", "content": "Hello!"}
    ],
    max_tokens=100
)

print(response.choices[0].message.content)

Streaming

stream = client.chat.completions.create(
    model="local-model",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Open in Kaggle

Kaggle