POST /v2/dedicated-inferences

Create a new Dedicated Inference for your team. Send a POST request to /v2/dedicated-inferences with a spec object (version, name, region, vpc, enable_public_endpoint, model_deployments) and optional access_tokens (e.g. hugging_face_token for gated models). The response code 202 Accepted indicates the request was accepted for processing; it does not indicate success or failure. The token value is returned only on create; store it securely.

Servers

Request headers

Name Type Required Description
Content-Type String Yes The media type of the request body.

Default value: "application/json"

Request body fields

Name Type Required Description
access_tokens Object No

Key-value pairs for provider tokens (e.g. Hugging Face).

spec Object Yes

Structured configuration for a Dedicated Inference deployment.

spec.region String Yes

DigitalOcean region where the Dedicated Inference is hosted.

Valid values:

  • "atl1"
  • "nyc2"
  • "tor1"
spec.name String Yes

Name of the Dedicated Inference. Must be unique within the team.

spec.model_deployments[] Array Yes

At least one model deployment is required.

spec.model_deployments[].workload_config Object No

Workload-specific configuration (e.g. ISL/OSL in future).

spec.model_deployments[].model_id String No

Used to identify an existing deployment when updating; empty means create new.

spec.model_deployments[].accelerators[] Array No

Accelerator configuration for this deployment.

spec.model_deployments[].accelerators[].type String Yes

Accelerator type (e.g. prefill_decode).

spec.model_deployments[].accelerators[].status String No

Current state of the Accelerator.

Valid values:

  • "active"
  • "provisioning"
  • "new"
spec.model_deployments[].accelerators[].scale Integer Yes

Number of accelerator instances.

spec.model_deployments[].accelerators[].accelerator_slug String Yes

DigitalOcean GPU slug.

spec.model_deployments[].model_slug String No

Model identifier (e.g. Hugging Face slug).

spec.model_deployments[].model_provider String No

Model provider.

Valid values:

  • "hugging_face"
spec.version Integer Yes

Spec version.

spec.enable_public_endpoint Boolean Yes

Whether to expose a public LLM endpoint.

spec.vpc Object Yes
spec.vpc.uuid String Yes

VPC UUID for the Dedicated Inference.

How to start integrating

  1. Add HTTP Task to your workflow definition.
  2. Search for the API you want to integrate with and click on the name.
    • This loads the API reference documentation and prepares the Http request settings.
  3. Click Test request to test run your request to the API and see the API's response.