* server : add Anthropic Messages API support
* remove -@pytest.mark.slow from tool calling/jinja tests
* server : remove unused code and slow/skip on test_anthropic_vision_base64_with_multimodal_model in test_anthropic_api.py
* server : removed redundant n field logic in anthropic_params_from_json
* server : use single error object instead of error_array in streaming response handler for /v1/chat/completions and use unordered_set instead of set in to_json_anthropic_stream()
* server : refactor Anthropic API to use OAI conversion
* make sure basic test always go first
* clean up
* clean up api key check, add test
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* minor : code style
* server : fix prompt similarity calculation
* server : initial host-memory prompt caching
* cont
* server : refactor
* cont
* cont : make the server task of the slot const
* cont : minor [no ci]
* server : cache prompts and checkpoints only for completion tasks
* server : improve prompt caching logic
* cont : fix check for number of cached prompts [no ci]
* server : improve caching logic, add -cram CLI arg
* server : print prompt mismatch info
* cont : better naming [no ci]
* server : improve prompt cache loading logic
* server : add option to debug the slot contents (#16482)
* server : add option to debug the slot contents
* Update tools/server/server.cpp
---------
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
* server : add option to disable prompt cache
---------
Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>