read_pdf_text
Extract text from PDF files using the pdftotext utility. Specify pages, preserve layout formatting, or set encoding options. Ideal for processing single or multi-page documents.
Instructions
Extract text content from a PDF file using pdftotext from poppler-utils
Input Schema
Name | Required | Description | Default |
---|---|---|---|
encoding | No | Text encoding for output (default: UTF-8) | UTF-8 |
layout | No | Preserve original text layout formatting (default: false) | |
page | No | Specific page number to extract (1-based indexing). If not specified, extracts all pages. | |
path | Yes | Path to the PDF file (relative to current working directory or absolute path) |