G
GTM Vault
Browse
  • Dashboard
    • Automations
    • Skills
    • Prompts
    • Makers
  • Sign in
All Automations

Content Scraping

API endpoint: Content Scraping. Part of the Web Data Engine.

View on GitHub
apiEnrichment & ResearchAI-PoweredPowers XlrateData OpsSales Intelligence
Original
by Roheel Jain
Nodes

15

Triggers

1

Endpoints

1

Platform

n8n

Tech Stack
Web Scraping
DeepSeek AI
DeepSeek AI
HTML Parser
n8n
n8n
HTTP API
Part of Suite

Web Data Engine

7 API endpoints for website scraping, parsing, and AI-powered data extraction. Tech stack detection, logo extraction, content scraping, and case study discovery.

API Endpoint
api/v1/web/Content-Scraping/
Workflow
1 trigger10 code3 api1 ai
Webhook
Set Response
Linked Pages1
Merge
Create chat completion1
Return Blog base url
Check For RSS
If RSS?
Format RSS Feed
Fetch Blogs
Format Blogs HTML
If Blogs Not Found AI
Format Blogs Output
Final Response Format
normalise URL to HTTPS
How It Works

Automatically discovers and extracts blog content from any company website, even when blogs aren't easy to find. Perfect for competitive research, content analysis, or building prospect databases with fresh insights.

1

Submit website URL

You provide any company website URL and the system automatically prepares it for analysis, ensuring it's in the right format to begin searching.

2

Search for blog sections

The system intelligently explores the website to find blog pages, resource sections, or news areas where content is published.

3

Check for RSS feeds

Looks for RSS feeds first since they provide the cleanest, most organized way to access blog content and metadata.

4

Extract blog posts

Retrieves actual blog content from the discovered pages, whether through RSS feeds or by directly reading the web pages.

5
Attached Files
apiv1webcontent-scraping.json

Clean and format content

Removes unnecessary HTML code, ads, and navigation elements to give you clean, readable blog content and metadata.

6

AI enhancement when needed

When blog content isn't found through normal methods, AI steps in to help locate and extract content that might be harder to detect.

7

Deliver structured results

Returns organized blog data including titles, content, publication dates, and URLs in a format ready for your marketing tools or analysis.

What You'll Need

DeepSeek AIWeb Scraping EngineContent Parser
  • DeepSeek AI account and API key for content enhancement
  • Target website URLs that you want to analyze
  • Basic understanding of how you'll use the extracted blog data
  • System or tool ready to receive the structured content data

Estimated Cost per Run

USD 0.01 – 0.05 (Cost varies based on website size and whether AI enhancement is needed. Most runs cost under 2 cents.)

This automation has 1 configurable setting you'll customize during setup.