> ## Documentation Index
> Fetch the complete documentation index at: https://docs.skop.dev/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Scraping Job

> Start a new document scraping job with a website URL and natural language prompt

Creates a new document scraping job that runs asynchronously in the background.

## Request Body

```json theme={null}
{
  "website": "https://example.com",
  "prompt": "Find all board meeting minutes from 2025",
  "parameters": {
    "single_page": false,
    "timeout": 1800,
    "confidence_threshold": 0.7,
    "file_type": "document",
    "max_file_size_mb": 100
  }
}
```

### Required Fields

| Field     | Type   | Description                                           |
| --------- | ------ | ----------------------------------------------------- |
| `website` | string | Starting URL to scrape (must be valid HTTP/HTTPS URL) |
| `prompt`  | string | Description of documents to find (10-500 characters)  |

### Parameters Object

| Field                  | Type    | Default      | Description                                  |
| ---------------------- | ------- | ------------ | -------------------------------------------- |
| `single_page`          | boolean | `true`       | Only scrape the provided URL (no navigation) |
| `timeout`              | integer | `1800`       | Max time in seconds (60-3600)                |
| `confidence_threshold` | float   | `0.1`        | Min AI confidence score (0.0-1.0)            |
| `file_type`            | string  | `"document"` | Type of files to extract                     |
| `max_file_size_mb`     | integer | `100`        | Max file size in MB (1-500)                  |

## Example Request

```javascript theme={null}
const response = await fetch('https://api.skop.dev/scrape/', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer sk-your-api-key',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    website: 'https://example.com',
    prompt: 'Find meeting minutes for 2025',
    parameters: {
      single_page: true,
      timeout: 1800,
      confidence_threshold: 0.7,
      file_type: 'document',
      max_file_size_mb: 100
    }
  })
})

const job = await response.json()
```

## Response (201 Created)

```json theme={null}
{
  "job_id": "job_4fc79a89797e",
  "status": "pending",
  "message": "Job created successfully and queued for processing",
  "estimated_completion": "2025-07-24T21:00:00Z",
  "created_at": "2025-07-24T20:50:00Z"
}
```

### Response Fields

| Field                  | Type   | Description                           |
| ---------------------- | ------ | ------------------------------------- |
| `job_id`               | string | Unique identifier for the created job |
| `status`               | string | Initial job status (always `pending`) |
| `message`              | string | Success message                       |
| `estimated_completion` | string | ISO 8601 estimated completion time    |
| `created_at`           | string | ISO 8601 job creation timestamp       |

## Error Responses

| Status | Error Code                   | Description                      |
| ------ | ---------------------------- | -------------------------------- |
| `400`  | `validation_error`           | Invalid request parameters       |
| `402`  | `insufficient_credits`       | Not enough credits               |
| `429`  | `concurrency_limit_exceeded` | Too many concurrent jobs         |
| `503`  | `service_unavailable`        | Required services not configured |


## OpenAPI

````yaml POST /scrape/
openapi: 3.1.0
info:
  title: Skop PDF Scraper API
  description: >-
    AI-powered document discovery and extraction from websites using natural
    language prompts
  version: 1.0.0
  contact:
    email: support@skop.dev
  license:
    name: MIT
servers:
  - url: https://api.skop.dev
    description: Production server
security:
  - bearerAuth: []
paths:
  /scrape/:
    post:
      summary: Create Scraping Job
      description: >-
        Start a new document scraping job with a website URL and natural
        language prompt
      operationId: createJob
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ScrapeRequest'
      responses:
        '201':
          description: Job created successfully
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/JobCreateResponse'
        '400':
          description: Bad Request - Invalid parameters
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '401':
          description: Unauthorized - Invalid API key
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '402':
          description: Payment Required - Insufficient credits
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
        '429':
          description: Too Many Requests - Rate limit exceeded
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
components:
  schemas:
    ScrapeRequest:
      type: object
      required:
        - website
        - prompt
      properties:
        website:
          type: string
          format: uri
          description: Starting URL to scrape (must be valid HTTP/HTTPS URL)
          example: https://example.com
        prompt:
          type: string
          minLength: 10
          maxLength: 500
          description: Description of documents to find (10-500 characters)
          example: Find board meeting minutes from 2025
        parameters:
          $ref: '#/components/schemas/ScrapeParameters'
    JobCreateResponse:
      type: object
      properties:
        job_id:
          type: string
          pattern: ^job_[a-z0-9]+$
          example: job_4fc79a89797e
          description: Unique identifier for the created job
        status:
          type: string
          enum:
            - pending
          description: Initial job status (always 'pending')
        message:
          type: string
          example: Job created successfully and queued for processing
          description: Success message
        estimated_completion:
          type: string
          format: date-time
          description: ISO 8601 estimated completion time
        created_at:
          type: string
          format: date-time
          description: ISO 8601 job creation timestamp
    ErrorResponse:
      type: object
      properties:
        error:
          type: boolean
          example: true
          description: Indicates this is an error response
        message:
          type: string
          description: Human-readable error message
        status_code:
          type: integer
          description: HTTP status code
        path:
          type: string
          description: API path that generated the error
        timestamp:
          type: string
          format: date-time
          description: ISO 8601 error timestamp
    ScrapeParameters:
      type: object
      properties:
        single_page:
          type: boolean
          default: false
          description: Only scrape the provided URL, don't navigate to other pages
        timeout:
          type: integer
          minimum: 60
          maximum: 3600
          default: 1800
          description: Job timeout in seconds (60-3600)
        confidence_threshold:
          type: number
          minimum: 0
          maximum: 1
          default: 0.1
          description: Minimum AI confidence score for document relevance (0.0-1.0)
        file_type:
          type: string
          enum:
            - document
          default: document
          description: Type of files to extract
        max_file_size_mb:
          type: integer
          minimum: 1
          maximum: 500
          default: 100
          description: Maximum file size to download in MB (1-500)
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: API key in format 'sk-xxxxxxxxxxxxx' or 'sk_xxxxxxxxxxxxx'

````