* server : add thinking content blocks to Anthropic Messages API
Add support for returning reasoning/thinking content in Anthropic API
responses when using models with --reasoning-format deepseek and the
thinking parameter enabled.
- Non-streaming: adds thinking block before text in content array
- Streaming: emits thinking_delta events with correct block indices
- Partial streaming: tracks reasoning state across chunks via
anthropic_has_reasoning member variable
Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model.
* server : fix Anthropic API streaming for thinking content blocks
Add signature field and fix duplicate content_block_start events in
Anthropic Messages API streaming responses for reasoning models.
* server: refactor Anthropic streaming state to avoid raw pointer
Replace raw pointer to task_result_state with direct field copies:
- Copy state fields in update() before processing chunk
- Use local copies in to_json_anthropic() instead of dereferencing
- Pre-compute state updates for next chunk in update()
This makes the data flow clearer and avoids unsafe pointer patterns.
* server: prevent data race from HTTP threads
* fix params
* fix default_generation_settings
* nits: make handle_completions_impl looks less strange
* stricter const
* fix GGML_ASSERT(idx < states.size())
* move index to be managed by server_response_reader
* http: make sure req & res lifecycle are tied together
* fix compile
* fix index handling buggy
* fix data race for lora endpoint
* nits: fix shadow variable
* nits: revert redundant changes
* nits: correct naming for json_webui_settings
* backend support
* server: support multiple generations from one prompt (OAI "n" option)
* fix invalid batch
* format oai
* clean up
* disable ctx shift
* add test
* update comments
* fix style
* add n_cmpl to docs [no ci]
* allowing using both n_cmpl and n
* server : add Anthropic Messages API support
* remove -@pytest.mark.slow from tool calling/jinja tests
* server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py
* server : removed redundant n field logic in anthropic_params_from_json
* server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()
* server : refactor Anthropic API to use OAI conversion
* make sure basic test always go first
* clean up
* clean up api key check, add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>