Spider Scrape Node

Overview
Configuration Parameters
Node Configuration
Expected Inputs and Outputs
Use Case Examples
Error Handling and Troubleshooting
Relevant Nodes

Overview

The Spider Scrape Node provides a powerful web scraping tool for extracting content from websites. This node enables you to:

Extract text content from single web pages
Crawl multiple subpages automatically
Parse structured data from HTML
Include metadata for citations
Handle dynamic and JavaScript-rendered content

Deprecated. Please use the Web Search Node instead.

This node is not SOC2 compliant. Please use it responsibly and ensure you have permission to scrape the target websites.

Configuration Parameters

Node Configuration

Target Site URL: The URL of the webpage you want to scrape (e.g., https://www.example.com)
Crawl Subpages: Enable crawling to automatically read multiple webpages linked from the target URL

Advanced Settings

Include document metadata for citations: When enabled, includes XML-formatted metadata with source URLs for proper attribution

Expected Inputs and Outputs

Inputs:
- The node accepts text input that can be used to format the target URL dynamically
Outputs:
- content: Extracted text content from the webpage(s)

Use Case Examples

Content Aggregation: Extract articles, blog posts, or documentation from websites for analysis or archiving.
Competitive Intelligence: Monitor competitor websites for changes in pricing, features, or content.
Data Collection: Gather structured data from multiple pages for market research or database population.

Error Handling and Troubleshooting

Website Access Blocked: Some websites block scraping attempts. Respect robots.txt files and website terms of service.
JavaScript Rendering Issues: If content isn’t loading properly, the website may require JavaScript execution which Spider handles automatically.
Rate Limiting: Avoid making too many requests too quickly to prevent being blocked by the target website.

If you encounter any issues not covered in this documentation, please reach out to our support team for assistance.

Relevant Nodes

Web Extract Node

Extract content from websites

Browser Agent

Navigate and interact with websites using AI

Web Search Node

Search the web for information

Snowflake Agent Node Sticky Note

Get Started

Guides

Nodes

Spider Scrape Node

Overview

Configuration Parameters

Node Configuration

Expected Inputs and Outputs

Use Case Examples

Error Handling and Troubleshooting

Relevant Nodes

Web Extract Node

Browser Agent

Web Search Node

Get Started

Guides

Nodes

​Overview

​Configuration Parameters

​Node Configuration

​Expected Inputs and Outputs

​Use Case Examples

​Error Handling and Troubleshooting

​Relevant Nodes

Web Extract Node

Browser Agent

Web Search Node

Overview

Configuration Parameters

Node Configuration

Expected Inputs and Outputs

Use Case Examples

Error Handling and Troubleshooting

Relevant Nodes