ragcat builds retrieval-augmented generation (RAG) stores from U.S. Fish and Wildlife Service ServCat references, downloaded ServCat files, already-downloaded local source files, and optional webpage URLs.
The package wraps a ServCat-first evidence workflow:
- Search ServCat with user-supplied Quick Search terms or explicit reference IDs.
- Download files attached to matching ServCat references.
- Convert downloaded ServCat files, local files, and webpage URLs to Markdown with
ragnar::read_as_markdown(). - Screen extracted file text with user-supplied screening terms.
- Build a DuckDB-backed
ragnarstore from verified sources. - Retrieve context or ask questions through
ellmer, with structured answer exports.
Installation
You can install the development version of ragcat from GitHub with:
# install.packages("pak")
pak::pak("USFWS/ragcat")Then load the package:
Minimal RAG store build
library(ragcat)
result <- build_rag_store(
topic = "example_project",
search_terms = c("watershed assessment", "habitat survey"),
screening_terms = c("habitat", "survey", "stream flow", "water quality"),
local_file_path = "data/local_sources",
urls = c(
"https://example.org/example-source-page"
),
store_dir = "data/rag_store",
store_location = "data/rag_store/rag_store.duckdb",
embedding = "none",
secure = FALSE,
overwrite_store = TRUE,
verbose = TRUE
)Use embedding = "none" for a minimal no-credentials example. For vector retrieval, use one of the supported embedding backends, such as embedding = "azure-openai-small", embedding = "openai-small", or embedding = "ollama-default".
Build from local files and webpage URLs only
If you do not want to search ServCat, omit search_terms or set it to character(). You can still build a store from local files and webpage URLs.
result <- build_rag_store(
topic = "local_sources_example",
local_file_path = "data/local_sources",
urls = c("https://example.org/example-source-page"),
screening_terms = c("habitat", "survey"),
store_dir = "data/rag_store",
store_location = "data/rag_store/rag_store.duckdb",
embedding = "none"
)Build from explicit ServCat reference IDs
If you already know the ServCat references you want to use, provide them with reference_ids.
result <- build_rag_store(
topic = "reference_id_example",
reference_ids = c(12345, 67890),
screening_terms = c("habitat", "survey"),
local_file_path = "data/local_sources",
store_dir = "data/rag_store",
store_location = "data/rag_store/rag_store.duckdb",
embedding = "none",
secure = FALSE,
overwrite_store = TRUE
)Ask a question against a built store
response <- ask_rag_store(
query = "Create an annotated bibliography by source for the available evidence.",
topic = "example_project",
store_location = "data/rag_store/rag_store.duckdb",
prompt_file = "prompts/system_prompt.md",
output_instructions_file = "prompts/output_instructions.md",
top_k = 12,
save_outputs = TRUE,
output_dir = "results"
)
cat(response$summary)
response$structured
response$savedRetrieve context without asking an LLM
chunks <- retrieve_rag_context(
query = "What evidence is available for the project question?",
store_location = "data/rag_store/rag_store.duckdb",
top_k = 12
)
format_retrieved_context(chunks)Prompt and output-instruction files
ask_rag_store() can use Markdown or text files for the system prompt and answer-formatting instructions.
response <- ask_rag_store(
query = "Summarize the strongest and weakest evidence.",
store_location = "data/rag_store/rag_store.duckdb",
prompt_file = "prompts/system_prompt.md",
output_instructions_file = "prompts/output_instructions.md"
)If no prompt file is supplied, ragcat uses the package default system prompt, or a saved system_prompt.txt beside the store when available.
Getting help
Contact a project maintainer for help with this repository.
Contribute
Contact the project maintainer for information about contributing to this repository.
Submit a GitHub Issue to report a bug or request a feature or enhancement.
This work is licensed under a Creative Commons Zero Universal v1.0 License.
