Split-GPU Setup¶
Configure LLM on GPU 0 and Graphistry on GPU 1.
Architecture¶
GPU 0: llama-server (15GB)
↓ Extract knowledge graphs
GPU 1: RAPIDS + Graphistry (15GB)
→ Visualize millions of nodes/edges
Setup GPU 0 (LLM)¶
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
from llcuda.server import ServerManager, ServerConfig
config = ServerConfig(model_path="model.gguf", n_gpu_layers=99)
server = ServerManager()
server.start_with_config(config)
Setup GPU 1 (Graphistry)¶
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import graphistry
graphistry.register(api=3, protocol="https", server="hub.graphistry.com")
Workflow¶
- Run LLM inference on GPU 0
- Extract entities/relations from output
- Build graph on GPU 1 with cuDF
- Visualize with Graphistry