POST /accounts/{account_id}/browser-rendering/crawl
Starts a crawl job for the provided URL and its children. Check available options like gotoOptions and waitFor* to control page load behaviour.
Servers
- https://api.cloudflare.com/client/v4
Path parameters
| Name | Type | Required | Description |
|---|---|---|---|
account_id |
String | Yes |
Account ID. |
Request headers
| Name | Type | Required | Description |
|---|---|---|---|
Content-Type |
String | Yes |
The media type of the request body.
Default value: "application/json" |
Query parameters
| Name | Type | Required | Description |
|---|---|---|---|
cacheTTL |
Number | No |
Cache TTL default is 5s. Set to 0 to disable. Default value: 5 |
Request body fields
| Name | Type | Required | Description |
|---|---|---|---|
render |
Boolean | Yes |
Whether to render the page or fetch static content. True by default. Valid values:
|
source |
Object | No |
Source of links to crawl. 'sitemaps' - only crawl URLs from sitemaps, 'links' - only crawl URLs scraped from pages, 'all' - crawl both sitemap and scraped links (default). Default value: "all" |
modifiedSince |
Integer | No |
Unix timestamp (seconds since epoch) indicating to only crawl pages that were modified since this time. For sitemap URLs with a lastmod field, this is compared directly. For other URLs, the crawler will use If-Modified-Since header when fetching. URLs without modification information (no lastmod in sitemap and no Last-Modified header support) will be crawled. Note: This works in conjunction with maxAge - both filters must pass for a cached resource to be used. Must be within the last year and not in the future. |
setExtraHTTPHeaders |
Object | No | |
cookies[] |
Array | No |
Check options. |
cookies[].sourceScheme |
No | ||
cookies[].url |
String | No | |
cookies[].httpOnly |
Boolean | No | |
cookies[].domain |
String | No | |
cookies[].value |
String | Yes | |
cookies[].partitionKey |
String | No | |
cookies[].path |
String | No | |
cookies[].sourcePort |
Number | No | |
cookies[].sameParty |
Boolean | No | |
cookies[].priority |
No | ||
cookies[].name |
String | Yes |
Cookie name. |
cookies[].secure |
Boolean | No | |
cookies[].sameSite |
No | ||
cookies[].expires |
Number | No | |
depth |
Number | No |
Maximum number of levels deep the crawler will traverse from the starting URL. Default value: 100000 |
viewport |
Object | No |
Check options. Default value: { "width": 1920, "height": 1080 } |
viewport.deviceScaleFactor |
Number | No | |
viewport.height |
Number | Yes | |
viewport.isLandscape |
Boolean | No | |
viewport.width |
Number | Yes | |
viewport.isMobile |
Boolean | No | |
viewport.hasTouch |
Boolean | No | |
waitForTimeout |
Number | No |
Waits for a specified timeout before continuing. |
emulateMediaType |
String | No | |
addScriptTag[] |
Array | No |
Adds a |
addScriptTag[].id |
String | No | |
addScriptTag[].url |
String | No | |
addScriptTag[].content |
String | No | |
addScriptTag[].type |
String | No | |
formats[] |
Array | No |
Formats to return. Default is Default value: [ "html" ] |
gotoOptions |
Object | No |
Check options. |
gotoOptions.referrerPolicy |
String | No | |
gotoOptions.timeout |
Number | No |
Default value: 30000 |
gotoOptions.waitUntil |
No |
Default value: "domcontentloaded" |
|
gotoOptions.referer |
String | No | |
limit |
Number | No |
Maximum number of URLs to crawl. Default value: 10 |
options |
Object | No |
Additional options for the crawler. Default value: { "includeSubdomains": false, "includeExternalLinks": false } |
options.includeExternalLinks |
Boolean | No |
Include external links in the crawl job. If set to true, includeSubdomains is ignored. Default value: false |
options.includeSubdomains |
Boolean | No |
Include links to subdomains in the crawl job. This option is ignored if includeExternalLinks is true. Default value: false |
options.excludePatterns[] |
Array | No |
Exclude links matching the provided wildcard patterns in the crawl job. Example: 'https://example.com/privacy/**'. |
options.includePatterns[] |
Array | No |
Include only links matching the provided wildcard patterns in the crawl job. Include patterns are evaluated before exclude patterns. URLs that match any of the specified include patterns will be included in the crawl job. Example: 'https://example.com/blog/**'. |
jsonOptions |
Object | No |
Options for JSON extraction. |
jsonOptions.prompt |
String | No | |
jsonOptions.custom_ai[] |
Array | No |
Optional list of custom AI models to use for the request. The models will be tried in the order provided, and in case a model returns an error, the next one will be used as fallback. |
jsonOptions.custom_ai[].authorization |
String | No |
Authorization token for the AI model: |
jsonOptions.custom_ai[].model |
String | Yes |
AI model to use for the request. Must be formed as |
jsonOptions.response_format |
Object | No | |
jsonOptions.response_format.json_schema |
Object | No |
Schema for the response format. More information here: https://developers.cloudflare.com/workers-ai/json-mode/ |
jsonOptions.response_format.json_schema.mcn_policy_result |
String | No | |
jsonOptions.response_format.json_schema.realtimekit_success |
Boolean | No | |
jsonOptions.response_format.json_schema.dns-records_dns-record-batch-post |
Object | No | |
jsonOptions.response_format.json_schema.r2_messages[] |
Array | No | |
jsonOptions.response_format.type |
String | Yes | |
addStyleTag[] |
Array | No |
Adds a |
addStyleTag[].url |
String | No | |
addStyleTag[].content |
String | No | |
authenticate |
Object | No |
Provide credentials for HTTP authentication. |
authenticate.username |
String | Yes | |
authenticate.password |
String | Yes | |
setJavaScriptEnabled |
Boolean | No | |
allowRequestPattern[] |
Array | No |
Only allow requests that match the provided regex patterns, eg. '/^.*.(css)'. |
url |
String | Yes |
URL to navigate to, eg. |
allowResourceTypes[] |
Array | No |
Only allow requests that match the provided resource types, eg. 'image' or 'script'. |
actionTimeout |
Number | No |
The maximum duration allowed for the browser action to complete after the page has loaded (such as taking screenshots, extracting content, or generating PDFs). If this time limit is exceeded, the action stops and returns a timeout error. |
rejectResourceTypes[] |
Array | No |
Block undesired requests that match the provided resource types, eg. 'image' or 'script'. |
waitForSelector |
Object | No |
Wait for the selector to appear in page. Check options. |
waitForSelector.hidden |
Boolean | No |
Valid values:
|
waitForSelector.selector |
String | Yes | |
waitForSelector.visible |
Boolean | No |
Valid values:
|
waitForSelector.timeout |
Number | No | |
bestAttempt |
Boolean | No |
Attempt to proceed when 'awaited' events fail or timeout. |
rejectRequestPattern[] |
Array | No |
Block undesired requests that match the provided regex patterns, eg. '/^.*.(css)'. |
crawlPurposes[] |
Array | No |
List of crawl purposes to respect Content-Signal directives in robots.txt. Allowed values: 'search', 'ai-input', 'ai-train'. Learn more: https://contentsignals.org/. Default: ['search', 'ai-input', 'ai-train']. Default value: [ "search", "ai-input", "ai-train" ] |
maxAge |
Number | No |
Maximum age of a resource that can be returned from cache in seconds. Default is 1 day. Default value: 86400 |
How to start integrating
- Add HTTP Task to your workflow definition.
- Search for the API you want to integrate with and click on the name.
- This loads the API reference documentation and prepares the Http request settings.
- Click Test request to test run your request to the API and see the API's response.