Optimization Guide¶
Optimize llcuda performance on Kaggle.
1. Enable FlashAttention¶
2. Optimize Batch Size¶
config = ServerConfig(
batch_size=2048, # Larger for throughput
ubatch_size=512, # Smaller for latency
)
3. Tune Context Size¶
4. Use K-Quants¶
- Q4_K_M: Best balance
- Q5_K_M: Higher quality
- IQ3_XS: For 70B models