Reproducibility is a cornerstone of scientific research. localLLM is designed with reproducibility as a first-class feature, ensuring that your LLM-based analyses can be reliably replicated.
All generation functions in localLLM (quick_llama(),
generate(), and generate_parallel()) use
deterministic greedy decoding by default. This means
running the same prompt twice will produce identical results.
library(localLLM)
# Run the same query twice
response1 <- quick_llama("What is the capital of France?")
response2 <- quick_llama("What is the capital of France?")
# Results are identical
identical(response1, response2)#> [1] TRUE
Reproducibility is ensured even when temperature > 0:
# Stochastic generation with seed control
response1 <- quick_llama(
"Write a haiku about data science",
temperature = 0.9,
seed = 92092
)
response2 <- quick_llama(
"Write a haiku about data science",
temperature = 0.9,
seed = 92092
)
# Still reproducible with matching seeds
identical(response1, response2)#> [1] TRUE
# Different seeds produce different outputs
response3 <- quick_llama(
"Write a haiku about data science",
temperature = 0.9,
seed = 12345
)
identical(response1, response3)#> [1] FALSE
All generation functions compute SHA-256 hashes for both inputs and outputs. These hashes enable verification that collaborators used identical configurations and obtained matching results.
result <- quick_llama("What is machine learning?")
# Access the hashes
hashes <- attr(result, "hashes")
print(hashes)#> $input
#> [1] "a3f2b8c9d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1"
#>
#> $output
#> [1] "b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5"
The input hash includes: - Model identifier - Prompt text - Generation parameters (temperature, seed, max_tokens, etc.)
The output hash covers the generated text, allowing collaborators to verify they obtained matching results.
For multi-model comparisons, explore() computes hashes
per model:
res <- explore(
models = models,
prompts = template_builder,
hash = TRUE
)
# View hashes for each model
hash_df <- attr(res, "hashes")
print(hash_df)#> model_id input_hash output_hash
#> 1 gemma4b a3f2b8c9d4e5f6a7b8c9d0e1f2a3b4c5... b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9...
#> 2 llama3b c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0... d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1...
Set hash = FALSE to disable hash computation if not
needed.
Use document_start() and document_end() to
capture everything that happens during your analysis. The log
records:
# Start documentation
document_start(path = "analysis-log.txt")
# Run your analysis
result1 <- quick_llama("Classify this text: 'Great product!'")
result2 <- explore(models = models, prompts = prompts)
# End documentation
document_end()The log file contains a complete audit trail:
================================================================================
localLLM Analysis Log
================================================================================
Start Time: 2025-01-15 14:30:22 UTC
R Version: 4.4.0
localLLM Version: 1.1.0
Platform: aarch64-apple-darwin22.6.0
--------------------------------------------------------------------------------
Event: quick_llama call
Time: 2025-01-15 14:30:25 UTC
Model: Llama-3.2-3B-Instruct-Q5_K_M.gguf
Parameters: temperature=0, max_tokens=256, seed=1234
Input Hash: a3f2b8c9...
Output Hash: b4c5d6e7...
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Event: explore call
Time: 2025-01-15 14:31:45 UTC
Models: gemma4b, llama3b
Prompts: 100 samples
Engine: parallel
--------------------------------------------------------------------------------
================================================================================
End Time: 2025-01-15 14:35:12 UTC
Session Hash: e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2...
================================================================================
Even with temperature = 0, explicitly setting seeds
documents your intent:
Record your setup at the start of analysis:
#> $os
#> [1] "macOS 14.0"
#>
#> $cpu_cores
#> [1] 10
#>
#> $ram_gb
#> [1] 32
#>
#> $gpu
#> [1] "Apple M2 Pro"
Wrap your entire analysis in documentation calls:
| Feature | Function/Parameter | Purpose |
|---|---|---|
| Deterministic output | temperature = 0 (default) |
Same input = same output |
| Seed control | seed = 42 |
Reproducible stochastic generation |
| Hash verification | attr(result, "hashes") |
Verify identical configurations |
| Audit trails | document_start()/document_end() |
Complete session logging |
| Hardware info | hardware_profile() |
Record execution environment |
With these tools, your LLM-based analyses become fully reproducible and verifiable.