Documentation Index
Fetch the complete documentation index at: https://pathlit.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Spider Scrape Node provides a powerful web scraping tool for extracting content from websites. This node enables you to:- Extract text content from single web pages
- Crawl multiple subpages automatically
- Parse structured data from HTML
- Include metadata for citations
- Handle dynamic and JavaScript-rendered content
Deprecated. Please use the Web Search Node instead.
Configuration Parameters
Node Configuration
- Target Site URL: The URL of the webpage you want to scrape (e.g., https://www.example.com)
- Crawl Subpages: Enable crawling to automatically read multiple webpages linked from the target URL
Advanced Settings
Advanced Settings
- Include document metadata for citations: When enabled, includes XML-formatted metadata with source URLs for proper attribution
Expected Inputs and Outputs
-
Inputs:
- The node accepts text input that can be used to format the target URL dynamically
-
Outputs:
- content: Extracted text content from the webpage(s)
Use Case Examples
- Content Aggregation: Extract articles, blog posts, or documentation from websites for analysis or archiving.
- Competitive Intelligence: Monitor competitor websites for changes in pricing, features, or content.
- Data Collection: Gather structured data from multiple pages for market research or database population.
Error Handling and Troubleshooting
- Website Access Blocked: Some websites block scraping attempts. Respect robots.txt files and website terms of service.
- JavaScript Rendering Issues: If content isn’t loading properly, the website may require JavaScript execution which Spider handles automatically.
- Rate Limiting: Avoid making too many requests too quickly to prevent being blocked by the target website.
Relevant Nodes
Web Extract Node
Extract content from websites
Browser Agent
Navigate and interact with websites using AI
Web Search Node
Search the web for information