Document Reader Node

Overview
Configuration Parameters
Node Configuration
Expected Inputs and Outputs
Use Case Examples
Error Handling and Troubleshooting
Relevant Nodes

Overview

The Document Reader Node extracts text content from uploaded documents. This node enables you to:

Read text from PDFs, Word documents, PowerPoint, Excel, and more
Extract content from images using OCR
Control reading position with start index and max length
Process large documents incrementally
Support numerous document formats

Reading files with an LLM: For most workflows that need to read document content and use it with an LLM (summarize, answer questions, extract), we recommend using the Agent node with the Document Reader tool. You can connect the node that produces the file in series to that agent, or use an Agent Swarm if the producer is also an agent. See Files in Workflows — Reading files: Agent with Document Reader for details.

Configuration Parameters

Node Configuration

Upload File: Upload a document to read. Supported formats include:
- PDF (.pdf)
- Microsoft Office (.docx, .xlsx, .pptx)
- Text files (.txt, .md, .csv)
- Images with text (.jpg, .png)
- And many more
Start Index: Character position to start reading from. Use negative values to start from the end (default: 0)
Max Length: Maximum number of characters to read from the file. Useful for processing large documents in chunks.

Expected Inputs and Outputs

Inputs:
- The node accepts file uploads via the file selection input
Outputs:
- content: Extracted text content from the document

Use Case Examples

Document Processing: Extract text from uploaded documents for analysis, summarization, or data extraction workflows.
Incremental Reading: Process very large documents by reading in chunks using start_index and max_length parameters.
OCR Text Extraction: Extract text from scanned documents or images for digitization and searchability.

Error Handling and Troubleshooting

Unsupported File Format: Verify the uploaded file format is in the list of supported document types.
Empty Content: If no content is extracted, the file may be corrupted, password-protected, or contain only images without OCR.
Large File Performance: For very large files, consider using start_index and max_length to process the document in smaller chunks.

If you encounter any issues not covered in this documentation, please reach out to our support team for assistance.

Relevant Nodes

VLM Document Extraction

Extract structured data using vision models

Document Data Node

Process PDFs and images with OCR or VLM

LLM Node

Process extracted content with AI

Document Data Node Dropbox Agent Node

Get Started

Guides

Nodes

Document Reader Node

Overview

Configuration Parameters

Node Configuration

Expected Inputs and Outputs

Use Case Examples

Error Handling and Troubleshooting

Relevant Nodes

VLM Document Extraction

Document Data Node

LLM Node

Get Started

Guides

Nodes

​Overview

​Configuration Parameters

​Node Configuration

​Expected Inputs and Outputs

​Use Case Examples

​Error Handling and Troubleshooting

​Relevant Nodes

VLM Document Extraction

Document Data Node

LLM Node

Overview

Configuration Parameters

Node Configuration

Expected Inputs and Outputs

Use Case Examples

Error Handling and Troubleshooting

Relevant Nodes