> ## Documentation Index
> Fetch the complete documentation index at: https://docs.skop.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# n8n Integration

> Automate document extraction with n8n workflow automation

## Overview

This n8n workflow demonstrates how to integrate Skop's document extraction API into your automation pipelines. The workflow scrapes documents from websites, processes the results, and automatically uploads them to Google Drive.

<img src="https://mintcdn.com/skop/Gn2udsdpY7EjqFGr/n8n.png?fit=max&auto=format&n=Gn2udsdpY7EjqFGr&q=85&s=52172705bb3b3f79d3d436e56121c3a1" alt="n8n Workflow Diagram" width="1562" height="1626" data-path="n8n.png" />

## What This Workflow Does

1. **Input Collection**: Accepts user inputs for website URL, search prompt, API key, and scraping parameters
2. **Job Creation**: Creates a scraping job using the Skop API
3. **Status Monitoring**: Polls the job status until completion with automatic retry logic
4. **Document Processing**: Extracts individual documents from the results
5. **File Download**: Downloads each document from the source URLs
6. **Cloud Storage**: Uploads documents to Google Drive with organized naming

## Getting Started with n8n

[n8n](https://n8n.io/) is an open-source workflow automation tool that lets you connect different services and APIs.

### Prerequisites

* [n8n instance](https://n8n.io/cloud/) (cloud or self-hosted)
* [Skop API key](https://www.skop.dev/dashboard)
* Google Drive credentials (for document storage)

### Installation

1. Copy the workflow JSON below and import it into your n8n instance
2. Configure your credentials and API keys
3. Customize the workflow for your specific needs

### n8n Workflow JSON

Copy and paste this JSON into n8n to import the complete workflow:

```json theme={null}
{
    "name": "My workflow",
    "nodes": [
      {
        "parameters": {},
        "id": "aeb4f37b-fd11-46bc-93e3-c2fbc57dea3d",
        "name": "Start",
        "type": "n8n-nodes-base.start",
        "typeVersion": 1,
        "position": [
          -1408,
          304
        ]
      },
      {
        "parameters": {
          "fields": {
            "values": [
              {
                "name": "Prompt"
              },
              {
                "name": "Website URL"
              },
              {
                "name": "API Key"
              },
              {
                "name": "Single-page",
                "type": "booleanValue",
                "booleanValue": "false"
              }
            ]
          },
          "options": {}
        },
        "id": "e5e03541-7475-4da9-acd6-54bed0ae6846",
        "name": "Manual Inputs",
        "type": "n8n-nodes-base.set",
        "typeVersion": 3.2,
        "position": [
          -1264,
          304
        ]
      },
      {
        "parameters": {
          "method": "POST",
          "url": "https://api.skop.dev/scrape/",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $json['API Key'] }}"
              },
              {
                "name": "Content-Type",
                "value": "application/json"
              }
            ]
          },
          "sendBody": true,
          "bodyParameters": {
            "parameters": [
              {
                "name": "website",
                "value": "={{ $json['Website URL'] }}"
              },
              {
                "name": "prompt",
                "value": "={{ $json.Prompt }}"
              },
              {
                "name": "parameters",
                "value": "={{ { \"single_page\": $json[\"Single-page\"] } }}"
              }
            ]
          },
          "options": {}
        },
        "id": "46a590b4-f96d-4073-9c55-9d3f6896fe69",
        "name": "Create Scrape Job",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -1088,
          320
        ]
      },
      {
        "parameters": {
          "amount": 2,
          "unit": "minutes"
        },
        "id": "67443437-0f60-488f-be38-b2ddd7cac960",
        "name": "Wait for Processing",
        "type": "n8n-nodes-base.wait",
        "typeVersion": 1,
        "position": [
          -928,
          320
        ]
      },
      {
        "parameters": {
          "url": "=https://api.skop.dev/scrape/status/{{ $json.job_id }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $('Manual Inputs').item.json['API Key'] }}"
              }
            ]
          },
          "options": {}
        },
        "id": "b411c7e4-2777-43e6-82ca-6b37f81dd623",
        "name": "Check Job Status",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -768,
          320
        ]
      },
      {
        "parameters": {
          "conditions": {
            "string": [
              {
                "value1": "={{ $json.status }}",
                "value2": "completed"
              }
            ]
          }
        },
        "id": "bcdcedb3-dbaa-4640-b3e1-d0c1ab579b0a",
        "name": "Check if Completed",
        "type": "n8n-nodes-base.if",
        "typeVersion": 1,
        "position": [
          -608,
          320
        ]
      },
      {
        "parameters": {
          "url": "=https://api.skop.dev/scrape/results/{{ $json.job_id }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Authorization",
                "value": "=Bearer {{ $('Manual Inputs').item.json['API Key'] }}"
              }
            ]
          },
          "options": {}
        },
        "id": "6e7ec0dd-e66e-4373-adbf-3730ccde215a",
        "name": "Get Job Results",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -448,
          304
        ]
      },
      {
        "parameters": {
          "name": "={{ $json.name }}",
          "driveId": {
            "__rl": true,
            "mode": "list",
            "value": "My Drive"
          },
          "folderId": {
            "__rl": true,
            "value": "YOUR_FOLDER_ID_HERE",
            "mode": "list",
            "cachedResultName": "Your Target Folder",
            "cachedResultUrl": "https://drive.google.com/drive/folders/YOUR_FOLDER_ID_HERE"
          },
          "options": {}
        },
        "id": "a6f83cba-dd55-4e92-8aee-0b08d869c087",
        "name": "Upload to Google Drive",
        "type": "n8n-nodes-base.googleDrive",
        "typeVersion": 3,
        "position": [
          -768,
          816
        ],
        "credentials": {
          "googleDriveOAuth2Api": {
            "id": "YOUR_GOOGLE_DRIVE_CREDENTIALS",
            "name": "Google Drive account"
          }
        }
      },
      {
        "parameters": {
          "amount": 10,
          "unit": "seconds"
        },
        "id": "7f31305d-9f00-4ccb-b037-fdc5b0de9ca0",
        "name": "Wait and Retry",
        "type": "n8n-nodes-base.wait",
        "typeVersion": 1,
        "position": [
          -608,
          480
        ]
      },
      {
        "parameters": {
          "content": "## Extract documents from multiple pages using skop.dev",
          "height": 480,
          "width": 832,
          "color": 4
        },
        "type": "n8n-nodes-base.stickyNote",
        "position": [
          -1136,
          208
        ],
        "typeVersion": 1,
        "id": "3d5d121b-5643-4140-a880-e3b2018f0ae5",
        "name": "Sticky Note"
      },
      {
        "parameters": {
          "jsCode": "// Extract documents array from job results\nconst jobResults = $input.first().json;\n\nif (!jobResults.documents || !Array.isArray(jobResults.documents)) {\n  return [{\n    json: {\n      error: 'No documents found in results',\n      totalDocuments: 0,\n      documents: []\n    }\n  }];\n}\n\n// Return each document as a separate item for processing\nconst outputItems = jobResults.documents.map((doc, index) => ({\n  json: {\n    ...doc,\n    documentIndex: index + 1,\n    totalDocuments: jobResults.documents.length,\n    jobId: jobResults.job_id\n  }\n}));\n\nreturn outputItems;"
        },
        "id": "57c9bc5f-b650-42d3-9340-77a2307be6f9",
        "name": "Split Documents",
        "type": "n8n-nodes-base.code",
        "typeVersion": 2,
        "position": [
          -1072,
          816
        ]
      },
      {
        "parameters": {
          "url": "={{ $json.url }}",
          "sendHeaders": true,
          "headerParameters": {
            "parameters": [
              {
                "name": "Accept",
                "value": "application/pdf,application/octet-stream,*/*"
              },
              {
                "name": "Accept-Language",
                "value": "en-US,en;q=0.9"
              },
              {
                "name": "Cache-Control",
                "value": "no-cache"
              },
              {
                "name": "Referer",
                "value": "https://www.google.com/"
              },
              {
                "name": "User-Agent",
                "value": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
              }
            ]
          },
          "options": {
            "response": {
              "response": {
                "neverError": true,
                "responseFormat": "file"
              }
            }
          }
        },
        "id": "dfde3a4f-017e-4167-b81f-dd086384b299",
        "name": "Download Document",
        "type": "n8n-nodes-base.httpRequest",
        "typeVersion": 4.1,
        "position": [
          -912,
          816
        ]
      },
      {
        "parameters": {
          "content": "## Save Documents to Drive\n",
          "height": 288,
          "width": 576
        },
        "type": "n8n-nodes-base.stickyNote",
        "position": [
          -1136,
          720
        ],
        "typeVersion": 1,
        "id": "344c5132-0f82-4039-8c0d-de5b02769419",
        "name": "Sticky Note"
      }
    ],
    "pinData": {},
    "connections": {
      "Start": {
        "main": [
          [
            {
              "node": "Manual Inputs",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Manual Inputs": {
        "main": [
          [
            {
              "node": "Create Scrape Job",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Create Scrape Job": {
        "main": [
          [
            {
              "node": "Wait for Processing",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Wait for Processing": {
        "main": [
          [
            {
              "node": "Check Job Status",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Check Job Status": {
        "main": [
          [
            {
              "node": "Check if Completed",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Check if Completed": {
        "main": [
          [
            {
              "node": "Get Job Results",
              "type": "main",
              "index": 0
            }
          ],
          [
            {
              "node": "Wait and Retry",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Get Job Results": {
        "main": [
          [
            {
              "node": "Split Documents",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Upload to Google Drive": {
        "main": [
          []
        ]
      },
      "Wait and Retry": {
        "main": [
          [
            {
              "node": "Check Job Status",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Split Documents": {
        "main": [
          [
            {
              "node": "Download Document",
              "type": "main",
              "index": 0
            }
          ]
        ]
      },
      "Download Document": {
        "main": [
          [
            {
              "node": "Upload to Google Drive",
              "type": "main",
              "index": 0
            }
          ]
        ]
      }
    },
    "active": false,
    "settings": {
      "executionOrder": "v1"
    },
    "meta": {
      "templateCredsSetupCompleted": true
    },
    "tags": []
  }
```

## Workflow Configuration

### Required Changes

Before running the workflow, you'll need to update these key components:

#### 1. API Key Configuration

```javascript theme={null}
// In the "Manual Inputs" node, users will provide:
{
  "API Key": "sk-your-api-key-here",
  "Website URL": "https://example.com", 
  "Prompt": "Find board meeting minutes from 2024",
  "Single-page": false
}
```

#### 2. Google Drive Setup

In the "Upload to Google Drive" node, update:

* **Folder ID**: Replace `YOUR_FOLDER_ID_HERE` with your target Google Drive folder ID
* **Credentials**: Configure your Google Drive OAuth2 credentials in n8n

```json theme={null}
{
  "folderId": {
    "value": "1ABC123_your_actual_folder_id_here"
  },
  "credentials": {
    "googleDriveOAuth2Api": {
      "id": "your-configured-credential-id"
    }
  }
}
```

#### 3. Timing Adjustments

Modify wait times based on your typical job duration:

* **Initial Wait**: Currently set to 2 minutes (node: "Wait for Processing")
* **Retry Wait**: Currently set to 10 seconds (node: "Wait and Retry")

### Node Breakdown

| Node                       | Purpose                          | Configuration Needed                   |
| -------------------------- | -------------------------------- | -------------------------------------- |
| **Manual Inputs**          | Collect user parameters          | None - ready to use                    |
| **Create Scrape Job**      | Submit job to Skop API           | Ensure API endpoint is correct         |
| **Check Job Status**       | Monitor job progress             | None - uses dynamic job ID             |
| **Check if Completed**     | Conditional logic for job status | None - checks for "completed" status   |
| **Get Job Results**        | Retrieve extracted documents     | None - uses dynamic job ID             |
| **Split Documents**        | Process document array           | None - JavaScript code included        |
| **Download Document**      | Fetch document files             | May need User-Agent adjustment         |
| **Upload to Google Drive** | Save to cloud storage            | **Requires folder ID and credentials** |

## How the Code Works

### 1. Job Creation Flow

The workflow starts by collecting inputs and making a POST request to the Skop API:

```javascript theme={null}
// HTTP Request configuration
{
  "method": "POST",
  "url": "https://api.skop.dev/scrape/",
  "headers": {
    "Authorization": "Bearer {{ $json['API Key'] }}",
    "Content-Type": "application/json"
  },
  "body": {
    "website": "{{ $json['Website URL'] }}",
    "prompt": "{{ $json.Prompt }}",
    "parameters": {
      "single_page": "{{ $json['Single-page'] }}"
    }
  }
}
```

### 2. Status Polling Logic

The workflow implements a polling pattern to check job completion:

```javascript theme={null}
// Conditional check for job completion
if (status === "completed") {
  // Proceed to get results
} else {
  // Wait and retry
}
```

### 3. Document Processing

A custom JavaScript function splits the document array for individual processing:

```javascript theme={null}
// Extract documents array from job results
const jobResults = $input.first().json;

if (!jobResults.documents || !Array.isArray(jobResults.documents)) {
  return [{
    json: {
      error: 'No documents found in results',
      totalDocuments: 0,
      documents: []
    }
  }];
}

// Return each document as a separate item
const outputItems = jobResults.documents.map((doc, index) => ({
  json: {
    ...doc,
    documentIndex: index + 1,
    totalDocuments: jobResults.documents.length,
    jobId: jobResults.job_id
  }
}));

return outputItems;
```

### 4. File Download and Storage

Each document is downloaded and uploaded to Google Drive with proper naming:

```javascript theme={null}
// Download request with proper headers
{
  "url": "{{ $json.url }}",
  "headers": {
    "Accept": "application/pdf,application/octet-stream,*/*",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36..."
  },
  "options": {
    "response": {
      "responseFormat": "file"
    }
  }
}
```

## Customization Options

### Error Handling

Add error handling nodes to manage API failures:

* HTTP status code checks
* Retry limits for failed downloads
* Email notifications for workflow failures

### Advanced Features

Extend the workflow with additional functionality:

* **Document Analysis**: Add OCR or text extraction nodes
* **Content Filtering**: Filter documents by size, date, or content
* **Multi-Destination Upload**: Save to multiple cloud storage services
* **Slack Notifications**: Send completion alerts to team channels

### Scheduling

Set up automatic execution:

* **Cron Triggers**: Run daily/weekly document collection
* **Webhook Triggers**: Trigger from external systems
* **Manual Execution**: Run on-demand from n8n interface

## Best Practices

1. **API Key Security**: Store API keys in n8n's credential system, never hardcode them
2. **Rate Limiting**: Add delays between API calls to respect rate limits
3. **Error Recovery**: Implement proper error handling and retry logic
4. **Monitoring**: Set up notifications for workflow failures
5. **Testing**: Test with smaller document sets before production use

## Troubleshooting

### Common Issues

**Job Never Completes**

* Increase timeout values in wait nodes
* Check if the website is accessible
* Verify your prompt is specific enough

**Google Drive Upload Fails**

* Confirm Google Drive credentials are properly configured
* Check folder permissions and folder ID validity
* Ensure sufficient storage space

**Download Errors**

* Some documents may require specific headers or authentication
* Add retry logic for failed downloads
* Check document URL accessibility

### Support Resources

* [n8n Community](https://community.n8n.io/)
* [n8n Documentation](https://docs.n8n.io/)
* [Skop API Support](mailto:support@skop.dev)

## Related Links

* [Skop API Documentation](/api-reference/introduction)
* [n8n Official Website](https://n8n.io/)
* [Google Drive API Setup](https://developers.google.com/drive/api/quickstart)
