localLLM provides an easy-to-use interface to run
large language models (LLMs) directly in R. It uses the performant
llama.cpp library as the backend and allows you to generate
text and analyze data with LLMs. Everything runs locally on your own
machine, completely free, with reproducibility by default.
Getting started requires two simple steps: installing the R package and downloading the backend C++ library.
The simplest way to get started is with
quick_llama():
#> The capital of France is Paris.
quick_llama() is a high-level wrapper designed for
convenience. On first run, it automatically downloads and caches the
default model (Llama-3.2-3B-Instruct-Q5_K_M.gguf).
A common use case is classifying text. Here’s a sentiment analysis example:
response <- quick_llama(
'Classify the sentiment of the following tweet into one of two
categories: Positive or Negative.
Tweet: "This paper is amazing! I really like it."'
)
cat(response)#> The sentiment of this tweet is Positive.
quick_llama() can handle different types of input:
# Process multiple prompts at once
prompts <- c(
"What is 2 + 2?",
"Name one planet in our solar system.",
"What color is the sky?"
)
responses <- quick_llama(prompts)
print(responses)#> [1] "2 + 2 equals 4."
#> [2] "One planet in our solar system is Mars."
#> [3] "The sky is typically blue during the day."
The localLLM backend only supports models in the GGUF
format. You can find thousands of GGUF models on Hugging Face:
.gguf file# From Hugging Face URL
response <- quick_llama(
"Explain quantum physics simply",
model = "https://huggingface.co/unsloth/gemma-3-4b-it-qat-GGUF/resolve/main/gemma-3-4b-it-qat-Q5_K_M.gguf"
)
# From local file
response <- quick_llama(
"Explain quantum physics simply",
model = "/path/to/your/model.gguf"
)
# From cache (name fragment)
response <- quick_llama(
"Explain quantum physics simply",
model = "Llama-3.2"
)Control the output with various parameters: