server : add documentation for `parallel_tool_calls` param (#15647)
Co-authored-by: Pierre F <no@p.e>
This commit is contained in:
parent
81017865ee
commit
792b44f2ed
|
|
@ -21,6 +21,8 @@ Function calling is supported for all models (see https://github.com/ggml-org/ll
|
|||
- Use `--chat-template-file` to override the template when appropriate (see examples below)
|
||||
- Generic support may consume more tokens and be less efficient than a model's native format.
|
||||
|
||||
- Multiple/parallel tool calling is supported on some models but disabled by default, enable it by passing `"parallel_tool_calls": true` in the completion endpoint payload.
|
||||
|
||||
<details>
|
||||
<summary>Show some common templates and which format handler they use</summary>
|
||||
|
||||
|
|
|
|||
|
|
@ -1143,6 +1143,8 @@ The `response_format` parameter supports both plain JSON output (e.g. `{"type":
|
|||
|
||||
`parse_tool_calls`: Whether to parse the generated tool call.
|
||||
|
||||
`parallel_tool_calls` : Whether to enable parallel/multiple tool calls (only supported on some models, verification is based on jinja template).
|
||||
|
||||
*Examples:*
|
||||
|
||||
You can use either Python `openai` library with appropriate checkpoints:
|
||||
|
|
|
|||
Loading…
Reference in New Issue