Overview
The VLM Document Extraction Node uses vision language models to extract content from PDF documents. This node enables you to:- Extract text, tables, and structured data from PDFs
- Process multiple PDF pages using configurable chunking (pages per chunk and page overlap)
- Customize extraction prompts for specific data formats
- Handle large documents by tuning chunk size and overlap
- Support various vision-capable models
This node is best for deterministic data extraction (invoices, forms, structured fields). For more consistent results, use Temperature 0 in Advanced Settings.
Configuration Parameters
Node Configuration
- Upload PDF: Upload one or more PDF documents to extract content from
-
Extraction Prompt:
Describe what to extract and the desired output format
Extract all text content from this document. Format the output as markdown with proper headings and sections.
Extract the table data from this invoice and return it as CSV format
Find all dates, amounts, and vendor names in this receipt
Advanced Settings
Advanced Settings
- Vision Model: Select the vision language model to use (e.g., GPT-4o, Gemini Pro Vision)
- Temperature: Controls randomness in extraction (0.0-2.0). Recommended: 0 for deterministic, repeatable extraction (invoices, forms, structured data).
- Max Tokens: Maximum output tokens per chunk (1000-100000)
- Pages per Chunk: Number of pages to process together in each chunk (1-20). Chunking is fixed by this value—there is no automatic or semantic chunking.
- Overlap: Number of pages that overlap between consecutive chunks (0-10). Use to preserve context across chunk boundaries (e.g. tables or paragraphs that span pages).
Expected Inputs and Outputs
-
Inputs:
- The node accepts PDF file uploads from the file upload input
-
Outputs:
- content: Extracted content in the requested format
Use Case Examples
- Invoice Processing: Extract structured data from invoice PDFs including vendor names, amounts, dates, and line items.
- Document Digitization: Convert scanned documents or images in PDFs to searchable, structured markdown or JSON.
- Form Data Extraction: Extract filled form data from PDF forms for database entry or further processing.
Error Handling and Troubleshooting
- Large Documents: For very large PDFs, adjust Pages per Chunk and Overlap to balance context (e.g. overlap 1–2 for continuity) with processing speed and token usage.
- Poor Image Quality: Low-quality scans may produce poor extraction results. Ensure PDF pages are readable before processing.
- Model Selection: Different vision models have varying capabilities. Experiment with models if extraction quality is unsatisfactory.