POST /v2/chat

Generates a text response to a user message and streams it down, token by token. To learn how to use the Chat API with streaming follow our Text Generation guides.

Follow the Migration Guide for instructions on moving from API v1 to API v2.

Servers

https://api.cohere.com

Request headers

Name	Type	Required	Description
`Content-Type`	String	Yes	The media type of the request body. Default value: "application/json"
`X-Client-Name`	String	No	The name of the project that is making the request.

Request body fields

Name	Type	Required	Description
`stream`	Boolean	No	Defaults to `false`. When `true`, the response will be a SSE stream of events. Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.
`k`	Integer	No	Ensures that only the top `k` most likely tokens are considered for generation at each step. When `k` is set to `0`, k-sampling is disabled. Defaults to `0`, min value of `0`, max value of `500`. Default value: 0
`safety_mode`	String	No	Used to select the safety instruction inserted into the prompt. Defaults to `CONTEXTUAL`. When `OFF` is specified, the safety instruction will be omitted. Safety modes are not yet configurable in combination with `tools` and `documents` parameters. Note: This parameter is only compatible newer Cohere models, starting with Command R 08-2024 and Command R+ 08-2024. Note: `command-r7b-12-2024` and newer models only support `"CONTEXTUAL"` and `"STRICT"` modes. Possible values: `"CONTEXTUAL"` `false` `"STRICT"`
`temperature`	Number	No	Defaults to `0.3`. A non-negative float that tunes the degree of randomness in generation. Lower temperatures mean less random generations, and higher temperatures mean more random generations. Randomness can be further maximized by increasing the value of the `p` parameter.
`p`	Number	No	Ensures that only the most likely tokens, with total probability mass of `p`, are considered for generation at each step. If both `k` and `p` are enabled, `p` acts after `k`. Defaults to `0.75`. min value of `0.01`, max value of `0.99`. Default value: 0.75
`tools[]`	Array	No	A list of tools (functions) available to the model. The model response may contain 'tool_calls' to the specified tools. Learn more in the Tool Use guide.
`tools[].function`	Object	No	The function to be executed.
`tools[].function.name`	String	Yes	The name of the function.
`tools[].function.description`	String	No	The description of the function.
`tools[].function.parameters`	Object	Yes	The parameters of the function as a JSON schema.
`tools[].type`	String	No	Possible values: `"function"`
`response_format`	Object	No	Configuration for forcing the model output to adhere to the specified format. Supported on Command R, Command R+ and newer models. The model can be forced into outputting JSON objects by setting `{ "type": "json_object" }`. A JSON Schema can optionally be provided, to ensure a specific structure. Note: When using `{ "type": "json_object" }` your `message` should always explicitly instruct the model to generate a JSON (eg: "Generate a JSON ...") . Otherwise the model may end up getting stuck generating an infinite stream of characters and eventually run out of context length. Note: When `json_schema` is not specified, the generated object can have up to 5 layers of nesting. Limitation: The parameter is not supported when used in combinations with the `documents` or `tools` parameters.
`tool_choice`	String	No	Used to control whether or not the model will be forced to use a tool when answering. When `REQUIRED` is specified, the model will be forced to use at least one of the user-defined tools, and the `tools` parameter must be passed in the request. When `NONE` is specified, the model will be forced not to use one of the specified tools, and give a direct response. If tool_choice isn't specified, then the model is free to choose whether to use the specified tools or not. Note: This parameter is only compatible with models Command-r7b and newer. Note: The same functionality can be achieved in `/v1/chat` using the `force_single_step` parameter. If `force_single_step=true`, this is equivalent to specifying `REQUIRED`. While if `force_single_step=true` and `tool_results` are passed, this is equivalent to specifying `NONE`. Possible values: `"REQUIRED"` `"NONE"`
`model`	String	Yes	The name of a compatible Cohere model or the ID of a fine-tuned model.
`frequency_penalty`	Number	No	Defaults to `0.0`, min value of `0.0`, max value of `1.0`. Used to reduce repetitiveness of generated tokens. The higher the value, the stronger a penalty is applied to previously present tokens, proportional to how many times they have already appeared in the prompt or prior generation.
`reasoning_effort`	String	No	The reasoning effort level of the model. This affects the model's performance and the time it takes to generate a response. Possible values: `"medium"` `"low"` `"high"`
`documents[]`	Array	No	A list of relevant documents that the model can cite to generate a more accurate reply. Each document is either a string or document object with content and metadata.
`seed`	Integer	No	If specified, the backend will make a best effort to sample tokens deterministically, such that repeated requests with the same seed and parameters should return the same result. However, determinism cannot be totally guaranteed.
`messages[]`	Array	Yes	A list of chat messages in chronological order, representing a conversation between the user and the model. Messages can be from `User`, `Assistant`, `Tool` and `System` roles. Learn more about messages and roles in the Chat API guide.
`strict_tools`	Boolean	No	When set to `true`, tool calls in the Assistant message will be forced to follow the tool definition strictly. Learn more in the Structured Outputs (Tools) guide. Note: The first few requests with a new set of tools will take longer to process.
`citation_options`	Object	No	Options for controlling citation generation.
`citation_options.mode`	String	No	Defaults to `"accurate"`. Dictates the approach taken to generating citations as part of the RAG flow by allowing the user to specify whether they want `"accurate"` results, `"fast"` results or no results. Note: `command-r7b-12-2024` and `command-a-03-2025` only support `"fast"` and `"off"` modes. The default is `"fast"`. Possible values: `"ACCURATE"` `false` `"FAST"`
`presence_penalty`	Number	No	Defaults to `0.0`, min value of `0.0`, max value of `1.0`. Used to reduce repetitiveness of generated tokens. Similar to `frequency_penalty`, except that this penalty is applied equally to all tokens that have already appeared, regardless of their exact frequencies.
`max_tokens`	Integer	No	The maximum number of tokens the model will generate as part of the response. Note: Setting a low value may result in incomplete generations.
`logprobs`	Boolean	No	Defaults to `false`. When set to `true`, the log probabilities of the generated tokens will be included in the response.
`stop_sequences[]`	Array	No	A list of up to 5 strings that the model will use to stop generation. If the model generates a string that matches any of the strings in the list, it will stop generating tokens and return the generated text up to that point not including the stop sequence.

How to start integrating

Add HTTP Task to your workflow definition.
Search for the API you want to integrate with and click on the name.
- This loads the API reference documentation and prepares the Http request settings.
Click Test request to test run your request to the API and see the API's response.