Skip to contents

Converts a local file to Markdown with ragnar and collapses chunk text for term screening.

Usage

extract_file_text_for_screening(
  local_path,
  origin,
  target_size = 2200L,
  target_overlap = 0.15,
  segment_by_heading_levels = c(1L, 2L),
  screening_cache_dir = NULL,
  use_screening_cache = !is.null(screening_cache_dir),
  refresh_screening_cache = FALSE
)

Arguments

local_path

Path to a local source file.

origin

Origin label passed to ragnar.

target_size, target_overlap, segment_by_heading_levels

Chunking controls.

screening_cache_dir

Optional directory for cached screening text.

use_screening_cache

Logical; use cached screening text when available.

refresh_screening_cache

Logical; ignore existing cache entries and rewrite them.

Value

Character scalar extracted text. Cache status is attached as attributes.

Examples

if (FALSE) { # \dontrun{
text <- extract_file_text_for_screening(
  local_path = file.path("data", "local_sources", "report.pdf"),
  origin = "local-file://report.pdf"
)
} # }