Web Scraping Commands¶

The web scraping commands help you extract content from websites, particularly focusing on Markdown content.

Available Commands¶

Markdown Scraping¶

# Scrape Markdown content from a URL
craftsmanship scrape url https://example.com/docs

# Scrape entire documentation site
craftsmanship scrape site https://example.com/docs

# Save scraped content to local files
craftsmanship scrape url https://example.com/docs --output ./local-docs

Scraping Options¶

The scrape commands support various options for controlling the scraping process:

Option	Description
`--depth`	Maximum recursion depth for following links
`--rate-limit`	Rate limiting to avoid overwhelming sites
`--concurrency`	Number of concurrent requests
`--format`	Output format (markdown, html, text)
`--selector`	CSS selector for targeting specific content
`--timeout`	Request timeout value

Content Processing¶

Processing scraped content:

# Extract specific sections
craftsmanship scrape url https://example.com/docs --selector ".main-content"

# Process content into specific format
craftsmanship scrape url https://example.com/docs --process format

# Extract and organize content
craftsmanship scrape site https://example.com/docs --organize by-section

Authentication Support¶

Scraping content that requires authentication:

# Scrape with basic authentication
craftsmanship scrape url https://example.com/docs --auth basic --username user --password pass

# Use cookie-based authentication
craftsmanship scrape url https://example.com/docs --auth cookie --cookie-file ./cookies.json

Monitoring and Reporting¶

Monitoring scraping progress:

# Scrape with detailed progress reporting
craftsmanship scrape site https://example.com/docs --verbose

# Generate scraping report
craftsmanship scrape site https://example.com/docs --report ./scrape-report.json

Usage Examples¶

# Scrape documentation with depth limit
craftsmanship scrape site https://example.com/docs --depth 3 --output ./docs

# Extract specific content with rate limiting
craftsmanship scrape url https://example.com/api-docs --selector ".endpoint" --rate-limit 1