Build RAG Stores from ServCat, Local Source Files, and Websites • ragcat

ragcat builds retrieval-augmented generation (RAG) stores from U.S. Fish and Wildlife Service ServCat references, downloaded ServCat files, already-downloaded local source files, and optional webpage URLs.

The package wraps a ServCat-first evidence workflow:

Search ServCat with user-supplied Quick Search terms or explicit reference IDs.
Download files attached to matching ServCat references.
Convert downloaded ServCat files, local files, and webpage URLs to Markdown with ragnar::read_as_markdown().
Screen extracted file text with user-supplied screening terms.
Build a DuckDB-backed ragnar store from verified sources.
Retrieve context or ask questions through ellmer, with structured answer exports.

Installation

You can install the development version of ragcat from GitHub with:

# install.packages("pak")
pak::pak("USFWS/ragcat")

Then load the package:

library(ragcat)

Minimal RAG store build

library(ragcat)

result <- build_rag_store(
  topic = "example_project",
  search_terms = c("watershed assessment", "habitat survey"),
  screening_terms = c("habitat", "survey", "stream flow", "water quality"),
  local_file_path = "data/local_sources",
  urls = c(
    "https://example.org/example-source-page"
  ),
  store_dir = "data/rag_store",
  store_location = "data/rag_store/rag_store.duckdb",
  embedding = "none",
  secure = FALSE,
  overwrite_store = TRUE,
  verbose = TRUE
)

Use embedding = "none" for a minimal no-credentials example. For vector retrieval, use one of the supported embedding backends, such as embedding = "azure-openai-small", embedding = "openai-small", or embedding = "ollama-default".

Build from local files and webpage URLs only

If you do not want to search ServCat, omit search_terms or set it to character(). You can still build a store from local files and webpage URLs.

result <- build_rag_store(
  topic = "local_sources_example",
  local_file_path = "data/local_sources",
  urls = c("https://example.org/example-source-page"),
  screening_terms = c("habitat", "survey"),
  store_dir = "data/rag_store",
  store_location = "data/rag_store/rag_store.duckdb",
  embedding = "none"
)

Build from explicit ServCat reference IDs

If you already know the ServCat references you want to use, provide them with reference_ids.

result <- build_rag_store(
  topic = "reference_id_example",
  reference_ids = c(12345, 67890),
  screening_terms = c("habitat", "survey"),
  local_file_path = "data/local_sources",
  store_dir = "data/rag_store",
  store_location = "data/rag_store/rag_store.duckdb",
  embedding = "none",
  secure = FALSE,
  overwrite_store = TRUE
)

Ask a question against a built store

response <- ask_rag_store(
  query = "Create an annotated bibliography by source for the available evidence.",
  topic = "example_project",
  store_location = "data/rag_store/rag_store.duckdb",
  prompt_file = "prompts/system_prompt.md",
  output_instructions_file = "prompts/output_instructions.md",
  top_k = 12,
  save_outputs = TRUE,
  output_dir = "results"
)

cat(response$summary)
response$structured
response$saved

Retrieve context without asking an LLM

chunks <- retrieve_rag_context(
  query = "What evidence is available for the project question?",
  store_location = "data/rag_store/rag_store.duckdb",
  top_k = 12
)

format_retrieved_context(chunks)

Prompt and output-instruction files

ask_rag_store() can use Markdown or text files for the system prompt and answer-formatting instructions.

response <- ask_rag_store(
  query = "Summarize the strongest and weakest evidence.",
  store_location = "data/rag_store/rag_store.duckdb",
  prompt_file = "prompts/system_prompt.md",
  output_instructions_file = "prompts/output_instructions.md"
)

If no prompt file is supplied, ragcat uses the package default system prompt, or a saved system_prompt.txt beside the store when available.

Getting help

Contact a project maintainer for help with this repository.

Contribute

Contact the project maintainer for information about contributing to this repository.

Submit a GitHub Issue to report a bug or request a feature or enhancement.

This work is licensed under a Creative Commons Zero Universal v1.0 License.