n8n Integration

Overview

This n8n workflow demonstrates how to integrate Skop’s document extraction API into your automation pipelines. The workflow scrapes documents from websites, processes the results, and automatically uploads them to Google Drive. n8n Workflow Diagram

What This Workflow Does

Input Collection: Accepts user inputs for website URL, search prompt, API key, and scraping parameters
Job Creation: Creates a scraping job using the Skop API
Status Monitoring: Polls the job status until completion with automatic retry logic
Document Processing: Extracts individual documents from the results
File Download: Downloads each document from the source URLs
Cloud Storage: Uploads documents to Google Drive with organized naming

Getting Started with n8n

n8n is an open-source workflow automation tool that lets you connect different services and APIs.

Prerequisites

n8n instance (cloud or self-hosted)
Skop API key
Google Drive credentials (for document storage)

Installation

Copy the workflow JSON below and import it into your n8n instance
Configure your credentials and API keys
Customize the workflow for your specific needs

n8n Workflow JSON

Copy and paste this JSON into n8n to import the complete workflow:

{
    "name": "My workflow",
    "nodes": [
      {
        "parameters": {},
        "id": "aeb4f37b-fd11-46bc-93e3-c2fbc57dea3d",
        "name": "Start",
        "type": "n8n-nodes-base.start",
        "typeVersion": 1,
        "position": [
          -1408,
          304
        ]
      },
      {
        "parameters": {
          "fields": {
            "values": [
              {
                "name": "Prompt"
              },
              {
                "name": "Website URL"
              },
              {
                "name": "API Key"
              },
              {
                "name": "Single-page",
                "type": "booleanValue",
                "booleanValue": "false"
              }
            ]
          },
          "options": {}
        },
        "id": "e5e03541-7475-4da9-acd6-54bed0ae6846",
        "name": "Manual Inputs",
        "type": "n8n-nodes-base.set",
        "typeVersion": 3.2,
        "position": [
          -1264,
          304
        ]
      },
      {
        "parameters": {
          "method": "POST",
          "url": "https://api.skop.dev/scrape/",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $json['API Key'] }}"
              },
              {
                "name": "Content-Type",
                "value": "application/json"
              }
            ]
          },
          "sendBody": true,
          "bodyParameters": {
            "parameters": [
              {
                "name": "website",
                "value": "={{ $json['Website URL'] }}"
              },
              {
                "name": "prompt",
                "value": "={{ $json.Prompt }}"
              },
              {
                "name": "parameters",
                "value": "={{ { \"single_page\": $json[\"Single-page\"] } }}"
              }
            ]
          },
          "options": {}
        },
        "id": "46a590b4-f96d-4073-9c55-9d3f6896fe69",
        "name": "Create Scrape Job",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -1088,
          320
        ]
      },
      {
        "parameters": {
          "amount": 2,
          "unit": "minutes"
        },
        "id": "67443437-0f60-488f-be38-b2ddd7cac960",
        "name": "Wait for Processing",
        "type": "n8n-nodes-base.wait",
        "typeVersion": 1,
        "position": [
          -928,
          320
        ]
      },
      {
        "parameters": {
          "url": "=https://api.skop.dev/scrape/status/{{ $json.job_id }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $('Manual Inputs').item.json['API Key'] }}"
              }
            ]
          },
          "options": {}
        },
        "id": "b411c7e4-2777-43e6-82ca-6b37f81dd623",
        "name": "Check Job Status",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -768,
          320
        ]
      },
      {
        "parameters": {
          "conditions": {
            "string": [
              {
                "value1": "={{ $json.status }}",
                "value2": "completed"
              }
            ]
          }
        },
        "id": "bcdcedb3-dbaa-4640-b3e1-d0c1ab579b0a",
        "name": "Check if Completed",
        "type": "n8n-nodes-base.if",
        "typeVersion": 1,
        "position": [
          -608,
          320
        ]
      },
      {
        "parameters": {
          "url": "=https://api.skop.dev/scrape/results/{{ $json.job_id }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $('Manual Inputs').item.json['API Key'] }}"
              }
            ]
          },
          "options": {}
        },
        "id": "6e7ec0dd-e66e-4373-adbf-3730ccde215a",
        "name": "Get Job Results",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -448,
          304
        ]
      },
      {
        "parameters": {
          "name": "={{ $json.name }}",
          "driveId": {
            "__rl": true,
            "mode": "list",
            "value": "My Drive"
          },
          "folderId": {
            "__rl": true,
            "value": "YOUR_FOLDER_ID_HERE",
            "mode": "list",
            "cachedResultName": "Your Target Folder",
            "cachedResultUrl": "https://drive.google.com/drive/folders/YOUR_FOLDER_ID_HERE"
          },
          "options": {}
        },
        "id": "a6f83cba-dd55-4e92-8aee-0b08d869c087",
        "name": "Upload to Google Drive",
        "type": "n8n-nodes-base.googleDrive",
        "typeVersion": 3,
        "position": [
          -768,
          816
        ],
        "credentials": {
          "googleDriveOAuth2Api": {
            "id": "YOUR_GOOGLE_DRIVE_CREDENTIALS",
            "name": "Google Drive account"
          }
        }
      },
      {
        "parameters": {
          "amount": 10,
          "unit": "seconds"
        },
        "id": "7f31305d-9f00-4ccb-b037-fdc5b0de9ca0",
        "name": "Wait and Retry",
        "type": "n8n-nodes-base.wait",
        "typeVersion": 1,
        "position": [
          -608,
          480
        ]
      },
      {
        "parameters": {
          "content": "## Extract documents from multiple pages using skop.dev",
          "height": 480,
          "width": 832,
          "color": 4
        },
        "type": "n8n-nodes-base.stickyNote",
        "position": [
          -1136,
          208
        ],
        "typeVersion": 1,
        "id": "3d5d121b-5643-4140-a880-e3b2018f0ae5",
        "name": "Sticky Note"
      },
      {
        "parameters": {
          "jsCode": "// Extract documents array from job results\nconst jobResults = $input.first().json;\n\nif (!jobResults.documents || !Array.isArray(jobResults.documents)) {\n  return [{\n    json: {\n      error: 'No documents found in results',\n      totalDocuments: 0,\n      documents: []\n    }\n  }];\n}\n\n// Return each document as a separate item for processing\nconst outputItems = jobResults.documents.map((doc, index) => ({\n  json: {\n    ...doc,\n    documentIndex: index + 1,\n    totalDocuments: jobResults.documents.length,\n    jobId: jobResults.job_id\n  }\n}));\n\nreturn outputItems;"
        },
        "id": "57c9bc5f-b650-42d3-9340-77a2307be6f9",
        "name": "Split Documents",
        "type": "n8n-nodes-base.code",
        "typeVersion": 2,
        "position": [
          -1072,
          816
        ]
      },
      {
        "parameters": {
          "url": "={{ $json.url }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Accept",
                "value": "application/pdf,application/octet-stream,*/*"
              },
              {
                "name": "Accept-Language",
                "value": "en-US,en;q=0.9"
              },
              {
                "name": "Cache-Control",
                "value": "no-cache"
              },
              {
                "name": "Referer",
                "value": "https://www.google.com/"
              },
              {
                "name": "User-Agent",
                "value": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
              }
            ]
          },
          "options": {
            "response": {
              "response": {
                "neverError": true,
                "responseFormat": "file"
              }
            }
          }
        },
        "id": "dfde3a4f-017e-4167-b81f-dd086384b299",
        "name": "Download Document",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -912,
          816
        ]
      },
      {
        "parameters": {
          "content": "## Save Documents to Drive\n",
          "height": 288,
          "width": 576
        },
        "type": "n8n-nodes-base.stickyNote",
        "position": [
          -1136,
          720
        ],
        "typeVersion": 1,
        "id": "344c5132-0f82-4039-8c0d-de5b02769419",
        "name": "Sticky Note"
      }
    ],
    "pinData": {},
    "connections": {
      "Start": {
        "main": [
          [
            {
              "node": "Manual Inputs",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Manual Inputs": {
        "main": [
          [
            {
              "node": "Create Scrape Job",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Create Scrape Job": {
        "main": [
          [
            {
              "node": "Wait for Processing",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Wait for Processing": {
        "main": [
          [
            {
              "node": "Check Job Status",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Check Job Status": {
        "main": [
          [
            {
              "node": "Check if Completed",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Check if Completed": {
        "main": [
          [
            {
              "node": "Get Job Results",
              "type": "main",
              "index": 0
            }
          ],
          [
            {
              "node": "Wait and Retry",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Get Job Results": {
        "main": [
          [
            {
              "node": "Split Documents",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Upload to Google Drive": {
        "main": [
          []
        ]
      },
      "Wait and Retry": {
        "main": [
          [
            {
              "node": "Check Job Status",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Split Documents": {
        "main": [
          [
            {
              "node": "Download Document",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Download Document": {
        "main": [
          [
            {
              "node": "Upload to Google Drive",
              "type": "main",
              "index": 0
            }
          ]
        ]
      }
    },
    "active": false,
    "settings": {
      "executionOrder": "v1"
    },
    "meta": {
      "templateCredsSetupCompleted": true
    },
    "tags": []
  }

Workflow Configuration

Required Changes

Before running the workflow, you’ll need to update these key components:

1. API Key Configuration

// In the "Manual Inputs" node, users will provide:
{
  "API Key": "sk-your-api-key-here",
  "Website URL": "https://example.com", 
  "Prompt": "Find board meeting minutes from 2024",
  "Single-page": false
}

2. Google Drive Setup

In the “Upload to Google Drive” node, update:

Folder ID: Replace YOUR_FOLDER_ID_HERE with your target Google Drive folder ID
Credentials: Configure your Google Drive OAuth2 credentials in n8n

{
  "folderId": {
    "value": "1ABC123_your_actual_folder_id_here"
  },
  "credentials": {
    "googleDriveOAuth2Api": {
      "id": "your-configured-credential-id"
    }
  }
}

3. Timing Adjustments

Modify wait times based on your typical job duration:

Initial Wait: Currently set to 2 minutes (node: “Wait for Processing”)
Retry Wait: Currently set to 10 seconds (node: “Wait and Retry”)

Node Breakdown

Node	Purpose	Configuration Needed
Manual Inputs	Collect user parameters	None - ready to use
Create Scrape Job	Submit job to Skop API	Ensure API endpoint is correct
Check Job Status	Monitor job progress	None - uses dynamic job ID
Check if Completed	Conditional logic for job status	None - checks for “completed” status
Get Job Results	Retrieve extracted documents	None - uses dynamic job ID
Split Documents	Process document array	None - JavaScript code included
Download Document	Fetch document files	May need User-Agent adjustment
Upload to Google Drive	Save to cloud storage	Requires folder ID and credentials

How the Code Works

1. Job Creation Flow

The workflow starts by collecting inputs and making a POST request to the Skop API:

// HTTP Request configuration
{
  "method": "POST",
  "url": "https://api.skop.dev/scrape/",
  "headers": {
    "Authorization": "Bearer {{ $json['API Key'] }}",
    "Content-Type": "application/json"
  },
  "body": {
    "website": "{{ $json['Website URL'] }}",
    "prompt": "{{ $json.Prompt }}",
    "parameters": {
      "single_page": "{{ $json['Single-page'] }}"
    }
  }
}

2. Status Polling Logic

The workflow implements a polling pattern to check job completion:

// Conditional check for job completion
if (status === "completed") {
  // Proceed to get results
} else {
  // Wait and retry
}

3. Document Processing

A custom JavaScript function splits the document array for individual processing:

// Extract documents array from job results
const jobResults = $input.first().json;

if (!jobResults.documents || !Array.isArray(jobResults.documents)) {
  return [{
    json: {
      error: 'No documents found in results',
      totalDocuments: 0,
      documents: []
    }
  }];
}

// Return each document as a separate item
const outputItems = jobResults.documents.map((doc, index) => ({
  json: {
    ...doc,
    documentIndex: index + 1,
    totalDocuments: jobResults.documents.length,
    jobId: jobResults.job_id
  }
}));

return outputItems;

4. File Download and Storage

Each document is downloaded and uploaded to Google Drive with proper naming:

// Download request with proper headers
{
  "url": "{{ $json.url }}",
  "headers": {
    "Accept": "application/pdf,application/octet-stream,*/*",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36..."
  },
  "options": {
    "response": {
      "responseFormat": "file"
    }
  }
}

Customization Options

Error Handling

Add error handling nodes to manage API failures:

HTTP status code checks
Retry limits for failed downloads
Email notifications for workflow failures

Advanced Features

Extend the workflow with additional functionality:

Document Analysis: Add OCR or text extraction nodes
Content Filtering: Filter documents by size, date, or content
Multi-Destination Upload: Save to multiple cloud storage services
Slack Notifications: Send completion alerts to team channels

Scheduling

Set up automatic execution:

Cron Triggers: Run daily/weekly document collection
Webhook Triggers: Trigger from external systems
Manual Execution: Run on-demand from n8n interface

Best Practices

API Key Security: Store API keys in n8n’s credential system, never hardcode them
Rate Limiting: Add delays between API calls to respect rate limits
Error Recovery: Implement proper error handling and retry logic
Monitoring: Set up notifications for workflow failures
Testing: Test with smaller document sets before production use

Troubleshooting

Common Issues

Job Never Completes

Increase timeout values in wait nodes
Check if the website is accessible
Verify your prompt is specific enough

Google Drive Upload Fails

Confirm Google Drive credentials are properly configured
Check folder permissions and folder ID validity
Ensure sufficient storage space

Download Errors

Some documents may require specific headers or authentication
Add retry logic for failed downloads
Check document URL accessibility

Getting Started

API Endpoints

Integrations

Overview

What This Workflow Does

Getting Started with n8n

Prerequisites

Installation

n8n Workflow JSON

Workflow Configuration

Required Changes

1. API Key Configuration

2. Google Drive Setup

3. Timing Adjustments

Node Breakdown

How the Code Works

1. Job Creation Flow

2. Status Polling Logic

3. Document Processing

4. File Download and Storage

Customization Options

Error Handling

Advanced Features

Scheduling

Best Practices

Troubleshooting

Common Issues

Support Resources

Getting Started

API Endpoints

Integrations

​Overview

​What This Workflow Does

​Getting Started with n8n

​Prerequisites

​Installation

​n8n Workflow JSON

​Workflow Configuration

​Required Changes

​1. API Key Configuration

​2. Google Drive Setup

​3. Timing Adjustments

​Node Breakdown

​How the Code Works

​1. Job Creation Flow

​2. Status Polling Logic

​3. Document Processing

​4. File Download and Storage

​Customization Options

​Error Handling

​Advanced Features

​Scheduling

​Best Practices

​Troubleshooting

​Common Issues

​Support Resources

​Related Links

Overview

What This Workflow Does

Getting Started with n8n

Prerequisites

Installation

n8n Workflow JSON

Workflow Configuration

Required Changes

1. API Key Configuration

2. Google Drive Setup

3. Timing Adjustments

Node Breakdown

How the Code Works

1. Job Creation Flow

2. Status Polling Logic

3. Document Processing

4. File Download and Storage

Customization Options

Error Handling

Advanced Features

Scheduling

Best Practices

Troubleshooting

Common Issues

Support Resources

Related Links