Get Started with localLLM

localLLM provides an easy-to-use interface to run large language models (LLMs) directly in R. It uses the performant llama.cpp library as the backend and allows you to generate text and analyze data with LLMs. Everything runs locally on your own machine, completely free, with reproducibility by default.

Installation

Getting started requires two simple steps: installing the R package and downloading the backend C++ library.

Step 1: Install the R package

# Install from CRAN
install.packages("localLLM")

Step 2: Install the backend library

The install_localLLM() function automatically detects your operating system (Windows, macOS, Linux) and processor architecture to download the appropriate pre-compiled library.

library(localLLM)
install_localLLM()

Your First LLM Query

The simplest way to get started is with quick_llama():

library(localLLM)

response <- quick_llama("What is the capital of France?")
cat(response)
#> The capital of France is Paris.

quick_llama() is a high-level wrapper designed for convenience. On first run, it automatically downloads and caches the default model (Llama-3.2-3B-Instruct-Q5_K_M.gguf).

Text Classification Example

A common use case is classifying text. Here’s a sentiment analysis example:

response <- quick_llama(
  'Classify the sentiment of the following tweet into one of two
   categories: Positive or Negative.

   Tweet: "This paper is amazing! I really like it."'
)

cat(response)
#> The sentiment of this tweet is Positive.

Processing Multiple Prompts

quick_llama() can handle different types of input:

# Process multiple prompts at once
prompts <- c(

  "What is 2 + 2?",
  "Name one planet in our solar system.",
  "What color is the sky?"
)

responses <- quick_llama(prompts)
print(responses)
#> [1] "2 + 2 equals 4."
#> [2] "One planet in our solar system is Mars."
#> [3] "The sky is typically blue during the day."

Finding and Using Models

GGUF Format

The localLLM backend only supports models in the GGUF format. You can find thousands of GGUF models on Hugging Face:

  1. Search for “gguf” on Hugging Face
  2. Filter by model family (e.g., “gemma gguf”, “llama gguf”)
  3. Copy the direct URL to the .gguf file

Loading Different Models

# From Hugging Face URL
response <- quick_llama(
  "Explain quantum physics simply",
  model = "https://huggingface.co/unsloth/gemma-3-4b-it-qat-GGUF/resolve/main/gemma-3-4b-it-qat-Q5_K_M.gguf"
)

# From local file
response <- quick_llama(
  "Explain quantum physics simply",
  model = "/path/to/your/model.gguf"
)

# From cache (name fragment)
response <- quick_llama(
  "Explain quantum physics simply",
  model = "Llama-3.2"
)

Managing Cached Models

# List all cached models
cached <- list_cached_models()
print(cached)
#>                                           name      size
#> 1 Llama-3.2-3B-Instruct-Q5_K_M.gguf           2.1 GB
#> 2 gemma-3-4b-it-qat-Q5_K_M.gguf               2.8 GB

Customizing Generation

Control the output with various parameters:

response <- quick_llama(
  prompt = "Write a haiku about programming",
  temperature = 0.8,      # Higher = more creative (default: 0)

max_tokens = 100,       # Maximum response length
  seed = 42,              # For reproducibility
  n_gpu_layers = 999      # Use GPU if available
)

Next Steps