Tensor Split vs NCCL¶
Understanding multi-GPU approaches.
Tensor Split (llama.cpp)¶
What: Native CUDA layer distribution Used by: llama.cpp server Purpose: Inference parallelism
NCCL (PyTorch)¶
What: Multi-GPU communication primitives Used by: PyTorch distributed training Purpose: Training parallelism
Key Differences¶
| Feature | Tensor Split | NCCL |
|---|---|---|
| Purpose | Inference | Training |
| Backend | llama.cpp | PyTorch |
| Communication | Direct CUDA | Collectives |
| Use Case | Model too large | Distributed training |
When to Use Each¶
- Tensor Split: Run 70B model on dual T4
- NCCL: Train model with DDP
See: Tutorial 08 - NCCL