Documentation Index
Fetch the complete documentation index at: https://docs.skop.dev/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This n8n workflow demonstrates how to integrate Skop’s document extraction API into your automation pipelines. The workflow scrapes documents from websites, processes the results, and automatically uploads them to Google Drive.
What This Workflow Does
- Input Collection: Accepts user inputs for website URL, search prompt, API key, and scraping parameters
- Job Creation: Creates a scraping job using the Skop API
- Status Monitoring: Polls the job status until completion with automatic retry logic
- Document Processing: Extracts individual documents from the results
- File Download: Downloads each document from the source URLs
- Cloud Storage: Uploads documents to Google Drive with organized naming
Getting Started with n8n
n8n is an open-source workflow automation tool that lets you connect different services and APIs.Prerequisites
- n8n instance (cloud or self-hosted)
- Skop API key
- Google Drive credentials (for document storage)
Installation
- Copy the workflow JSON below and import it into your n8n instance
- Configure your credentials and API keys
- Customize the workflow for your specific needs
n8n Workflow JSON
Copy and paste this JSON into n8n to import the complete workflow:Workflow Configuration
Required Changes
Before running the workflow, you’ll need to update these key components:1. API Key Configuration
2. Google Drive Setup
In the “Upload to Google Drive” node, update:- Folder ID: Replace
YOUR_FOLDER_ID_HEREwith your target Google Drive folder ID - Credentials: Configure your Google Drive OAuth2 credentials in n8n
3. Timing Adjustments
Modify wait times based on your typical job duration:- Initial Wait: Currently set to 2 minutes (node: “Wait for Processing”)
- Retry Wait: Currently set to 10 seconds (node: “Wait and Retry”)
Node Breakdown
| Node | Purpose | Configuration Needed |
|---|---|---|
| Manual Inputs | Collect user parameters | None - ready to use |
| Create Scrape Job | Submit job to Skop API | Ensure API endpoint is correct |
| Check Job Status | Monitor job progress | None - uses dynamic job ID |
| Check if Completed | Conditional logic for job status | None - checks for “completed” status |
| Get Job Results | Retrieve extracted documents | None - uses dynamic job ID |
| Split Documents | Process document array | None - JavaScript code included |
| Download Document | Fetch document files | May need User-Agent adjustment |
| Upload to Google Drive | Save to cloud storage | Requires folder ID and credentials |
How the Code Works
1. Job Creation Flow
The workflow starts by collecting inputs and making a POST request to the Skop API:2. Status Polling Logic
The workflow implements a polling pattern to check job completion:3. Document Processing
A custom JavaScript function splits the document array for individual processing:4. File Download and Storage
Each document is downloaded and uploaded to Google Drive with proper naming:Customization Options
Error Handling
Add error handling nodes to manage API failures:- HTTP status code checks
- Retry limits for failed downloads
- Email notifications for workflow failures
Advanced Features
Extend the workflow with additional functionality:- Document Analysis: Add OCR or text extraction nodes
- Content Filtering: Filter documents by size, date, or content
- Multi-Destination Upload: Save to multiple cloud storage services
- Slack Notifications: Send completion alerts to team channels
Scheduling
Set up automatic execution:- Cron Triggers: Run daily/weekly document collection
- Webhook Triggers: Trigger from external systems
- Manual Execution: Run on-demand from n8n interface
Best Practices
- API Key Security: Store API keys in n8n’s credential system, never hardcode them
- Rate Limiting: Add delays between API calls to respect rate limits
- Error Recovery: Implement proper error handling and retry logic
- Monitoring: Set up notifications for workflow failures
- Testing: Test with smaller document sets before production use
Troubleshooting
Common Issues
Job Never Completes- Increase timeout values in wait nodes
- Check if the website is accessible
- Verify your prompt is specific enough
- Confirm Google Drive credentials are properly configured
- Check folder permissions and folder ID validity
- Ensure sufficient storage space
- Some documents may require specific headers or authentication
- Add retry logic for failed downloads
- Check document URL accessibility