Skip to contents

Applies the same text-screening logic to local files discovered outside ServCat.

Usage

screen_local_files(
  local_files,
  screening_terms = DEFAULT_SCREENING_TERMS,
  min_screening_term_hits = if (length(screening_terms) > 0) 1L else 0L,
  target_size = 2200L,
  target_overlap = 0.15,
  segment_by_heading_levels = c(1L, 2L),
  screening_cache_dir = NULL,
  use_screening_cache = !is.null(screening_cache_dir),
  refresh_screening_cache = FALSE,
  verbose = FALSE
)

Arguments

local_files

Tibble of local file candidates.

screening_terms

Character vector of required file-screening terms.

min_screening_term_hits

Minimum required screening-term hits.

target_size, target_overlap, segment_by_heading_levels

Chunking controls.

screening_cache_dir

Optional directory for cached screening text.

use_screening_cache

Logical; use cached screening text when available.

refresh_screening_cache

Logical; ignore existing cache entries and rewrite them.

verbose

Logical; emit progress messages.

Value

A tibble screening log.

Examples

if (FALSE) { # \dontrun{
local_files <- list_local_source_files(file.path("data", "local_sources"))
screen_local_files(
  local_files,
  screening_terms = c("habitat", "survey"),
  verbose = TRUE
)
} # }