Applies the same text-screening logic to local files discovered outside ServCat.
Usage
screen_local_files(
local_files,
screening_terms = DEFAULT_SCREENING_TERMS,
min_screening_term_hits = if (length(screening_terms) > 0) 1L else 0L,
target_size = 2200L,
target_overlap = 0.15,
segment_by_heading_levels = c(1L, 2L),
screening_cache_dir = NULL,
use_screening_cache = !is.null(screening_cache_dir),
refresh_screening_cache = FALSE,
verbose = FALSE
)Arguments
- local_files
Tibble of local file candidates.
- screening_terms
Character vector of required file-screening terms.
- min_screening_term_hits
Minimum required screening-term hits.
- target_size, target_overlap, segment_by_heading_levels
Chunking controls.
- screening_cache_dir
Optional directory for cached screening text.
- use_screening_cache
Logical; use cached screening text when available.
- refresh_screening_cache
Logical; ignore existing cache entries and rewrite them.
- verbose
Logical; emit progress messages.
Examples
if (FALSE) { # \dontrun{
local_files <- list_local_source_files(file.path("data", "local_sources"))
screen_local_files(
local_files,
screening_terms = c("habitat", "survey"),
verbose = TRUE
)
} # }
