Overview

This n8n workflow demonstrates how to integrate Skop’s document extraction API into your automation pipelines. The workflow scrapes documents from websites, processes the results, and automatically uploads them to Google Drive. n8n Workflow Diagram

What This Workflow Does

  1. Input Collection: Accepts user inputs for website URL, search prompt, API key, and scraping parameters
  2. Job Creation: Creates a scraping job using the Skop API
  3. Status Monitoring: Polls the job status until completion with automatic retry logic
  4. Document Processing: Extracts individual documents from the results
  5. File Download: Downloads each document from the source URLs
  6. Cloud Storage: Uploads documents to Google Drive with organized naming

Getting Started with n8n

n8n is an open-source workflow automation tool that lets you connect different services and APIs.

Prerequisites

Installation

  1. Copy the workflow JSON below and import it into your n8n instance
  2. Configure your credentials and API keys
  3. Customize the workflow for your specific needs

n8n Workflow JSON

Copy and paste this JSON into n8n to import the complete workflow:
{
    "name": "My workflow",
    "nodes": [
      {
        "parameters": {},
        "id": "aeb4f37b-fd11-46bc-93e3-c2fbc57dea3d",
        "name": "Start",
        "type": "n8n-nodes-base.start",
        "typeVersion": 1,
        "position": [
          -1408,
          304
        ]
      },
      {
        "parameters": {
          "fields": {
            "values": [
              {
                "name": "Prompt"
              },
              {
                "name": "Website URL"
              },
              {
                "name": "API Key"
              },
              {
                "name": "Single-page",
                "type": "booleanValue",
                "booleanValue": "false"
              }
            ]
          },
          "options": {}
        },
        "id": "e5e03541-7475-4da9-acd6-54bed0ae6846",
        "name": "Manual Inputs",
        "type": "n8n-nodes-base.set",
        "typeVersion": 3.2,
        "position": [
          -1264,
          304
        ]
      },
      {
        "parameters": {
          "method": "POST",
          "url": "https://api.skop.dev/scrape/",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $json['API Key'] }}"
              },
              {
                "name": "Content-Type",
                "value": "application/json"
              }
            ]
          },
          "sendBody": true,
          "bodyParameters": {
            "parameters": [
              {
                "name": "website",
                "value": "={{ $json['Website URL'] }}"
              },
              {
                "name": "prompt",
                "value": "={{ $json.Prompt }}"
              },
              {
                "name": "parameters",
                "value": "={{ { \"single_page\": $json[\"Single-page\"] } }}"
              }
            ]
          },
          "options": {}
        },
        "id": "46a590b4-f96d-4073-9c55-9d3f6896fe69",
        "name": "Create Scrape Job",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -1088,
          320
        ]
      },
      {
        "parameters": {
          "amount": 2,
          "unit": "minutes"
        },
        "id": "67443437-0f60-488f-be38-b2ddd7cac960",
        "name": "Wait for Processing",
        "type": "n8n-nodes-base.wait",
        "typeVersion": 1,
        "position": [
          -928,
          320
        ]
      },
      {
        "parameters": {
          "url": "=https://api.skop.dev/scrape/status/{{ $json.job_id }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $('Manual Inputs').item.json['API Key'] }}"
              }
            ]
          },
          "options": {}
        },
        "id": "b411c7e4-2777-43e6-82ca-6b37f81dd623",
        "name": "Check Job Status",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -768,
          320
        ]
      },
      {
        "parameters": {
          "conditions": {
            "string": [
              {
                "value1": "={{ $json.status }}",
                "value2": "completed"
              }
            ]
          }
        },
        "id": "bcdcedb3-dbaa-4640-b3e1-d0c1ab579b0a",
        "name": "Check if Completed",
        "type": "n8n-nodes-base.if",
        "typeVersion": 1,
        "position": [
          -608,
          320
        ]
      },
      {
        "parameters": {
          "url": "=https://api.skop.dev/scrape/results/{{ $json.job_id }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $('Manual Inputs').item.json['API Key'] }}"
              }
            ]
          },
          "options": {}
        },
        "id": "6e7ec0dd-e66e-4373-adbf-3730ccde215a",
        "name": "Get Job Results",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -448,
          304
        ]
      },
      {
        "parameters": {
          "name": "={{ $json.name }}",
          "driveId": {
            "__rl": true,
            "mode": "list",
            "value": "My Drive"
          },
          "folderId": {
            "__rl": true,
            "value": "YOUR_FOLDER_ID_HERE",
            "mode": "list",
            "cachedResultName": "Your Target Folder",
            "cachedResultUrl": "https://drive.google.com/drive/folders/YOUR_FOLDER_ID_HERE"
          },
          "options": {}
        },
        "id": "a6f83cba-dd55-4e92-8aee-0b08d869c087",
        "name": "Upload to Google Drive",
        "type": "n8n-nodes-base.googleDrive",
        "typeVersion": 3,
        "position": [
          -768,
          816
        ],
        "credentials": {
          "googleDriveOAuth2Api": {
            "id": "YOUR_GOOGLE_DRIVE_CREDENTIALS",
            "name": "Google Drive account"
          }
        }
      },
      {
        "parameters": {
          "amount": 10,
          "unit": "seconds"
        },
        "id": "7f31305d-9f00-4ccb-b037-fdc5b0de9ca0",
        "name": "Wait and Retry",
        "type": "n8n-nodes-base.wait",
        "typeVersion": 1,
        "position": [
          -608,
          480
        ]
      },
      {
        "parameters": {
          "content": "## Extract documents from multiple pages using skop.dev",
          "height": 480,
          "width": 832,
          "color": 4
        },
        "type": "n8n-nodes-base.stickyNote",
        "position": [
          -1136,
          208
        ],
        "typeVersion": 1,
        "id": "3d5d121b-5643-4140-a880-e3b2018f0ae5",
        "name": "Sticky Note"
      },
      {
        "parameters": {
          "jsCode": "// Extract documents array from job results\nconst jobResults = $input.first().json;\n\nif (!jobResults.documents || !Array.isArray(jobResults.documents)) {\n  return [{\n    json: {\n      error: 'No documents found in results',\n      totalDocuments: 0,\n      documents: []\n    }\n  }];\n}\n\n// Return each document as a separate item for processing\nconst outputItems = jobResults.documents.map((doc, index) => ({\n  json: {\n    ...doc,\n    documentIndex: index + 1,\n    totalDocuments: jobResults.documents.length,\n    jobId: jobResults.job_id\n  }\n}));\n\nreturn outputItems;"
        },
        "id": "57c9bc5f-b650-42d3-9340-77a2307be6f9",
        "name": "Split Documents",
        "type": "n8n-nodes-base.code",
        "typeVersion": 2,
        "position": [
          -1072,
          816
        ]
      },
      {
        "parameters": {
          "url": "={{ $json.url }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Accept",
                "value": "application/pdf,application/octet-stream,*/*"
              },
              {
                "name": "Accept-Language",
                "value": "en-US,en;q=0.9"
              },
              {
                "name": "Cache-Control",
                "value": "no-cache"
              },
              {
                "name": "Referer",
                "value": "https://www.google.com/"
              },
              {
                "name": "User-Agent",
                "value": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
              }
            ]
          },
          "options": {
            "response": {
              "response": {
                "neverError": true,
                "responseFormat": "file"
              }
            }
          }
        },
        "id": "dfde3a4f-017e-4167-b81f-dd086384b299",
        "name": "Download Document",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -912,
          816
        ]
      },
      {
        "parameters": {
          "content": "## Save Documents to Drive\n",
          "height": 288,
          "width": 576
        },
        "type": "n8n-nodes-base.stickyNote",
        "position": [
          -1136,
          720
        ],
        "typeVersion": 1,
        "id": "344c5132-0f82-4039-8c0d-de5b02769419",
        "name": "Sticky Note"
      }
    ],
    "pinData": {},
    "connections": {
      "Start": {
        "main": [
          [
            {
              "node": "Manual Inputs",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Manual Inputs": {
        "main": [
          [
            {
              "node": "Create Scrape Job",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Create Scrape Job": {
        "main": [
          [
            {
              "node": "Wait for Processing",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Wait for Processing": {
        "main": [
          [
            {
              "node": "Check Job Status",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Check Job Status": {
        "main": [
          [
            {
              "node": "Check if Completed",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Check if Completed": {
        "main": [
          [
            {
              "node": "Get Job Results",
              "type": "main",
              "index": 0
            }
          ],
          [
            {
              "node": "Wait and Retry",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Get Job Results": {
        "main": [
          [
            {
              "node": "Split Documents",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Upload to Google Drive": {
        "main": [
          []
        ]
      },
      "Wait and Retry": {
        "main": [
          [
            {
              "node": "Check Job Status",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Split Documents": {
        "main": [
          [
            {
              "node": "Download Document",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Download Document": {
        "main": [
          [
            {
              "node": "Upload to Google Drive",
              "type": "main",
              "index": 0
            }
          ]
        ]
      }
    },
    "active": false,
    "settings": {
      "executionOrder": "v1"
    },
    "meta": {
      "templateCredsSetupCompleted": true
    },
    "tags": []
  }

Workflow Configuration

Required Changes

Before running the workflow, you’ll need to update these key components:

1. API Key Configuration

// In the "Manual Inputs" node, users will provide:
{
  "API Key": "sk-your-api-key-here",
  "Website URL": "https://example.com", 
  "Prompt": "Find board meeting minutes from 2024",
  "Single-page": false
}

2. Google Drive Setup

In the “Upload to Google Drive” node, update:
  • Folder ID: Replace YOUR_FOLDER_ID_HERE with your target Google Drive folder ID
  • Credentials: Configure your Google Drive OAuth2 credentials in n8n
{
  "folderId": {
    "value": "1ABC123_your_actual_folder_id_here"
  },
  "credentials": {
    "googleDriveOAuth2Api": {
      "id": "your-configured-credential-id"
    }
  }
}

3. Timing Adjustments

Modify wait times based on your typical job duration:
  • Initial Wait: Currently set to 2 minutes (node: “Wait for Processing”)
  • Retry Wait: Currently set to 10 seconds (node: “Wait and Retry”)

Node Breakdown

NodePurposeConfiguration Needed
Manual InputsCollect user parametersNone - ready to use
Create Scrape JobSubmit job to Skop APIEnsure API endpoint is correct
Check Job StatusMonitor job progressNone - uses dynamic job ID
Check if CompletedConditional logic for job statusNone - checks for “completed” status
Get Job ResultsRetrieve extracted documentsNone - uses dynamic job ID
Split DocumentsProcess document arrayNone - JavaScript code included
Download DocumentFetch document filesMay need User-Agent adjustment
Upload to Google DriveSave to cloud storageRequires folder ID and credentials

How the Code Works

1. Job Creation Flow

The workflow starts by collecting inputs and making a POST request to the Skop API:
// HTTP Request configuration
{
  "method": "POST",
  "url": "https://api.skop.dev/scrape/",
  "headers": {
    "Authorization": "Bearer {{ $json['API Key'] }}",
    "Content-Type": "application/json"
  },
  "body": {
    "website": "{{ $json['Website URL'] }}",
    "prompt": "{{ $json.Prompt }}",
    "parameters": {
      "single_page": "{{ $json['Single-page'] }}"
    }
  }
}

2. Status Polling Logic

The workflow implements a polling pattern to check job completion:
// Conditional check for job completion
if (status === "completed") {
  // Proceed to get results
} else {
  // Wait and retry
}

3. Document Processing

A custom JavaScript function splits the document array for individual processing:
// Extract documents array from job results
const jobResults = $input.first().json;

if (!jobResults.documents || !Array.isArray(jobResults.documents)) {
  return [{
    json: {
      error: 'No documents found in results',
      totalDocuments: 0,
      documents: []
    }
  }];
}

// Return each document as a separate item
const outputItems = jobResults.documents.map((doc, index) => ({
  json: {
    ...doc,
    documentIndex: index + 1,
    totalDocuments: jobResults.documents.length,
    jobId: jobResults.job_id
  }
}));

return outputItems;

4. File Download and Storage

Each document is downloaded and uploaded to Google Drive with proper naming:
// Download request with proper headers
{
  "url": "{{ $json.url }}",
  "headers": {
    "Accept": "application/pdf,application/octet-stream,*/*",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36..."
  },
  "options": {
    "response": {
      "responseFormat": "file"
    }
  }
}

Customization Options

Error Handling

Add error handling nodes to manage API failures:
  • HTTP status code checks
  • Retry limits for failed downloads
  • Email notifications for workflow failures

Advanced Features

Extend the workflow with additional functionality:
  • Document Analysis: Add OCR or text extraction nodes
  • Content Filtering: Filter documents by size, date, or content
  • Multi-Destination Upload: Save to multiple cloud storage services
  • Slack Notifications: Send completion alerts to team channels

Scheduling

Set up automatic execution:
  • Cron Triggers: Run daily/weekly document collection
  • Webhook Triggers: Trigger from external systems
  • Manual Execution: Run on-demand from n8n interface

Best Practices

  1. API Key Security: Store API keys in n8n’s credential system, never hardcode them
  2. Rate Limiting: Add delays between API calls to respect rate limits
  3. Error Recovery: Implement proper error handling and retry logic
  4. Monitoring: Set up notifications for workflow failures
  5. Testing: Test with smaller document sets before production use

Troubleshooting

Common Issues

Job Never Completes
  • Increase timeout values in wait nodes
  • Check if the website is accessible
  • Verify your prompt is specific enough
Google Drive Upload Fails
  • Confirm Google Drive credentials are properly configured
  • Check folder permissions and folder ID validity
  • Ensure sufficient storage space
Download Errors
  • Some documents may require specific headers or authentication
  • Add retry logic for failed downloads
  • Check document URL accessibility

Support Resources