Problem: You see the error “Backend library is not loaded. Please run install_localLLM() first.”
Solution: Run the installation function after loading the package:
This downloads the platform-specific backend library. You only need to do this once.
Problem: install_localLLM() fails to
download or install.
Solution: Check your platform is supported: - Windows (x86-64) - macOS (ARM64 / Apple Silicon) - Linux (x86-64)
If you’re on an unsupported platform, you may need to compile llama.cpp manually.
Problem: A previous download was interrupted and left a lock file.
Solution: Clear the cache directory:
cache_root <- tools::R_user_dir("localLLM", which = "cache")
models_dir <- file.path(cache_root, "models")
unlink(models_dir, recursive = TRUE, force = TRUE)Then try downloading again.
Problem: Large model downloads fail partway through.
Solution: 1. Check your internet connection 2. Try a smaller model first 3. Download manually and load from local path:
Problem: You’re trying to load a model by name but it’s not found.
Solution: Check what’s actually cached:
Use the exact filename or a unique substring that matches only one model.
Problem: Downloading a gated/private model fails with authentication error.
Solution: Set your Hugging Face token:
# Get token from https://huggingface.co/settings/tokens
set_hf_token("hf_your_token_here")
# Now download should work
model <- model_load("https://huggingface.co/private/model.gguf")Problem: R crashes or freezes when calling
model_load().
Solution: The model is too large for your available RAM. Try:
Problem: localLLM warns about insufficient memory.
Solution: The safety check detected potential issues. Options:
Use a smaller model
Reduce context size:
If you’re sure you have enough memory, proceed when prompted
Problem: Generation is slow even with
n_gpu_layers = 999.
Solution: Check if GPU is detected:
If no GPU is listed, the backend may not support your GPU. Currently supported: - NVIDIA GPUs (via CUDA) - Apple Silicon (Metal)
Problem: The model produces meaningless text.
Solution: 1. Ensure you’re using a chat template:
messages <- list(
list(role = "user", content = "Your question")
)
prompt <- apply_chat_template(model, messages)
result <- generate(ctx, prompt)<|eot_id|>Problem: Output includes control tokens.
Solution: Use the clean = TRUE
parameter:
Problem: Output is cut off before completion.
Solution: Increase max_tokens:
Problem: Text generation takes much longer than expected.
Solutions:
Use GPU acceleration:
Use a smaller model: Q4 quantization is faster than Q8
Reduce context size:
Use parallel processing for multiple prompts:
Problem: Trying to load a non-GGUF model.
Solution: localLLM only supports GGUF format. Convert your model or find a GGUF version on Hugging Face (search for “model-name gguf”).
| Error | Cause | Solution |
|---|---|---|
| “Backend library is not loaded” | Backend not installed | Run install_localLLM() |
| “Invalid model handle” | Model was freed/invalid | Reload the model |
| “Invalid context handle” | Context was freed/invalid | Recreate the context |
| “Failed to open library” | Backend installation issue | Reinstall with install_localLLM(force = TRUE) |
| “Download timeout” | Network issue or lock file | Clear cache and retry |
If you encounter issues not covered here:
?function_namesessionInfo()hardware_profile()# Check installation status
lib_is_installed()
# Check hardware
hardware_profile()
# List cached models
list_cached_models()
# List Ollama models
list_ollama_models()
# Clear model cache
cache_dir <- file.path(tools::R_user_dir("localLLM", "cache"), "models")
unlink(cache_dir, recursive = TRUE)
# Force reinstall backend
install_localLLM(force = TRUE)