Skip to content

llcuda v2.2.0 - CUDA12 Inference Backend for Unsloth | 1B-5B Models on Kaggle Dual T4

Dual T4 Performance

Initializing search

llcuda v2.2.0 - CUDA12 Inference Backend for Unsloth | 1B-5B Models on Kaggle Dual T4

llcuda v2.2.0
Learn
Learn
- Getting Started
  Getting Started
- Kaggle Dual T4
  Kaggle Dual T4
- Tutorial Notebooks
  Tutorial Notebooks
- Architecture
  Architecture
- Unsloth Integration
  Unsloth Integration
- Graphistry & Visualization
  Graphistry & Visualization
- Performance
  Performance
  - Performance Benchmarks
  - Dual T4 Performance Dual T4 Performance
    On this page
    
    Configuration
    
    Measured Performance
    
    Gemma 2-2B (Q4_K_M)
    
    Qwen2.5-7B (Q4_K_M)
    
    Llama-70B (IQ3_XS)
    
    Tuning Tips
  - Optimization Guide
  - Memory Management
  - FlashAttention
- GGUF & Quantization
  GGUF & Quantization
- Guides
  Guides
API Reference
API Reference

On this page

Configuration
Measured Performance
Tuning Tips

llcuda v2.2.0
Learn
Performance

Dual T4 Performance¶

Detailed benchmarks for Kaggle dual T4 setup.

Configuration¶

GPUs: 2× Tesla T4 (15GB each)
CUDA: 12.5
Driver: 535.104.05
FlashAttention: Enabled

Measured Performance¶

Gemma 2-2B (Q4_K_M)¶

Tokens/sec: 58-62
Latency: ~16ms/token
VRAM: 4.2 GB total
Strategy: tensor-split 0.5,0.5

Qwen2.5-7B (Q4_K_M)¶

Tokens/sec: 33-37
Latency: ~28ms/token
VRAM: 10.1 GB total
Strategy: tensor-split 0.5,0.5

Llama-70B (IQ3_XS)¶

Tokens/sec: 10-14
Latency: ~80ms/token
VRAM: 26.8 GB total
Strategy: tensor-split 0.48,0.48

Tuning Tips¶

Enable FlashAttention
Use optimal batch size
Adjust tensor-split ratios
Monitor VRAM usage

Was this page helpful?

Thanks for your feedback!

Thanks for your feedback! Help us improve by opening an issue.

Performance Benchmarks

Optimization Guide

Copyright © 2024-2026 Waqas Muhammad

Cookie consent

We use cookies to recognize your repeated visits and preferences, as well as to measure the effectiveness of our documentation and whether users find what they're searching for. With your consent, you're helping us to make our documentation better.

Google Analytics
GitHub

Manage settings