Skip to content

GGUF Tools API¶

GGUF file parsing and utilities.

Overview¶

Tools for working with GGUF model files.

Basic Usage¶

from llcuda.utils import GGUFParser

parser = GGUFParser(model_path="model.gguf")
print(f"Parameters: {parser.get_parameter_count() / 1e9:.1f}B")
print(f"Quantization: {parser.get_quantization()}")

Class Reference¶

`GGUFParser`¶

class GGUFParser:
    def __init__(self, model_path: str):
        """Initialize parser.

        Args:
            model_path: Path to GGUF file
        """

    def get_parameter_count(self) -> int:
        """Get total parameter count."""

    def get_quantization(self) -> str:
        """Get quantization type."""

    def get_context_length(self) -> int:
        """Get max context length."""

    def get_metadata(self) -> Dict:
        """Get all metadata."""

Functions¶

`estimate_vram()`¶

def estimate_vram(model_size_b: float, quant_type: str) -> float:
    """Estimate VRAM usage.

    Args:
        model_size_b: Model size in billions
        quant_type: Quantization type (Q4_K_M, IQ3_XS, etc.)

    Returns:
        Estimated VRAM in GB
    """

Quantization Types¶

K-Quants: - Q4_K_M, Q5_K_M, Q6_K, Q8_0

I-Quants: - IQ3_XS, IQ2_XXS

Examples¶

See GGUF Tutorial