POST /accounts/{account_id}/browser-rendering/crawl

Starts a crawl job for the provided URL and its children. Check available options like gotoOptions and waitFor* to control page load behaviour.

Servers

Path parameters

Name Type Required Description
account_id String Yes

Account ID.

Request headers

Name Type Required Description
Content-Type String Yes The media type of the request body.

Default value: "application/json"

Query parameters

Name Type Required Description
cacheTTL Number No

Cache TTL default is 5s. Set to 0 to disable.

Default value: 5

Request body fields

Name Type Required Description
render Boolean Yes

Whether to render the page or fetch static content. True by default.

Valid values:

  • false
source Object No

Source of links to crawl. 'sitemaps' - only crawl URLs from sitemaps, 'links' - only crawl URLs scraped from pages, 'all' - crawl both sitemap and scraped links (default).

Default value: "all"

modifiedSince Integer No

Unix timestamp (seconds since epoch) indicating to only crawl pages that were modified since this time. For sitemap URLs with a lastmod field, this is compared directly. For other URLs, the crawler will use If-Modified-Since header when fetching. URLs without modification information (no lastmod in sitemap and no Last-Modified header support) will be crawled. Note: This works in conjunction with maxAge - both filters must pass for a cached resource to be used. Must be within the last year and not in the future.

setExtraHTTPHeaders Object No
cookies[] Array No

Check options.

cookies[].sourceScheme No
cookies[].url String No
cookies[].httpOnly Boolean No
cookies[].domain String No
cookies[].value String Yes
cookies[].partitionKey String No
cookies[].path String No
cookies[].sourcePort Number No
cookies[].sameParty Boolean No
cookies[].priority No
cookies[].name String Yes

Cookie name.

cookies[].secure Boolean No
cookies[].sameSite No
cookies[].expires Number No
depth Number No

Maximum number of levels deep the crawler will traverse from the starting URL.

Default value: 100000

viewport Object No

Check options.

Default value: { "width": 1920, "height": 1080 }

viewport.deviceScaleFactor Number No
viewport.height Number Yes
viewport.isLandscape Boolean No
viewport.width Number Yes
viewport.isMobile Boolean No
viewport.hasTouch Boolean No
waitForTimeout Number No

Waits for a specified timeout before continuing.

emulateMediaType String No
addScriptTag[] Array No

Adds a <script> tag into the page with the desired URL or content.

addScriptTag[].id String No
addScriptTag[].url String No
addScriptTag[].content String No
addScriptTag[].type String No
formats[] Array No

Formats to return. Default is html.

Default value: [ "html" ]

gotoOptions Object No

Check options.

gotoOptions.referrerPolicy String No
gotoOptions.timeout Number No

Default value: 30000

gotoOptions.waitUntil No

Default value: "domcontentloaded"

gotoOptions.referer String No
limit Number No

Maximum number of URLs to crawl.

Default value: 10

options Object No

Additional options for the crawler.

Default value: { "includeSubdomains": false, "includeExternalLinks": false }

options.includeExternalLinks Boolean No

Include external links in the crawl job. If set to true, includeSubdomains is ignored.

Default value: false

options.includeSubdomains Boolean No

Include links to subdomains in the crawl job. This option is ignored if includeExternalLinks is true.

Default value: false

options.excludePatterns[] Array No

Exclude links matching the provided wildcard patterns in the crawl job. Example: 'https://example.com/privacy/**'.

options.includePatterns[] Array No

Include only links matching the provided wildcard patterns in the crawl job. Include patterns are evaluated before exclude patterns. URLs that match any of the specified include patterns will be included in the crawl job. Example: 'https://example.com/blog/**'.

jsonOptions Object No

Options for JSON extraction.

jsonOptions.prompt String No
jsonOptions.custom_ai[] Array No

Optional list of custom AI models to use for the request. The models will be tried in the order provided, and in case a model returns an error, the next one will be used as fallback.

jsonOptions.custom_ai[].authorization String No

Authorization token for the AI model: Bearer <token>. Not needed for workers-ai models.

jsonOptions.custom_ai[].model String Yes

AI model to use for the request. Must be formed as <provider>/<model_name>, e.g. workers-ai/@cf/meta/llama-3.3-70b-instruct-fp8-fast.

jsonOptions.response_format Object No
jsonOptions.response_format.json_schema Object No

Schema for the response format. More information here: https://developers.cloudflare.com/workers-ai/json-mode/

jsonOptions.response_format.json_schema.mcn_policy_result String No
jsonOptions.response_format.json_schema.realtimekit_success Boolean No
jsonOptions.response_format.json_schema.dns-records_dns-record-batch-post Object No
jsonOptions.response_format.json_schema.r2_messages[] Array No
jsonOptions.response_format.type String Yes
addStyleTag[] Array No

Adds a <link rel="stylesheet"> tag into the page with the desired URL or a <style type="text/css"> tag with the content.

addStyleTag[].url String No
addStyleTag[].content String No
authenticate Object No

Provide credentials for HTTP authentication.

authenticate.username String Yes
authenticate.password String Yes
setJavaScriptEnabled Boolean No
allowRequestPattern[] Array No

Only allow requests that match the provided regex patterns, eg. '/^.*.(css)'.

url String Yes

URL to navigate to, eg. https://example.com.

allowResourceTypes[] Array No

Only allow requests that match the provided resource types, eg. 'image' or 'script'.

actionTimeout Number No

The maximum duration allowed for the browser action to complete after the page has loaded (such as taking screenshots, extracting content, or generating PDFs). If this time limit is exceeded, the action stops and returns a timeout error.

rejectResourceTypes[] Array No

Block undesired requests that match the provided resource types, eg. 'image' or 'script'.

waitForSelector Object No

Wait for the selector to appear in page. Check options.

waitForSelector.hidden Boolean No

Valid values:

  • true
waitForSelector.selector String Yes
waitForSelector.visible Boolean No

Valid values:

  • true
waitForSelector.timeout Number No
bestAttempt Boolean No

Attempt to proceed when 'awaited' events fail or timeout.

rejectRequestPattern[] Array No

Block undesired requests that match the provided regex patterns, eg. '/^.*.(css)'.

crawlPurposes[] Array No

List of crawl purposes to respect Content-Signal directives in robots.txt. Allowed values: 'search', 'ai-input', 'ai-train'. Learn more: https://contentsignals.org/. Default: ['search', 'ai-input', 'ai-train'].

Default value: [ "search", "ai-input", "ai-train" ]

maxAge Number No

Maximum age of a resource that can be returned from cache in seconds. Default is 1 day.

Default value: 86400

How to start integrating

  1. Add HTTP Task to your workflow definition.
  2. Search for the API you want to integrate with and click on the name.
    • This loads the API reference documentation and prepares the Http request settings.
  3. Click Test request to test run your request to the API and see the API's response.