* implement sleeping at queue level
* implement server-context suspend
* add test
* add docs
* optimization: add fast path
* make sure to free llama_init
* nits
* fix use-after-free
* allow /models to be accessed during sleeping, fix use-after-free
* don't allow accessing /models during sleep, it is not thread-safe
* fix data race on accessing props and model_meta
* small clean up
* trailing whitespace
* rm outdated comments
* server: improve speed of speculative decoding
* fix small draft case
* add link to the PR
* server : fix generation time measurement
* server : fix draft acceptance logs (add SRV_CNT, SLT_CNT macros)
* server : add comment
* add PR to docs
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>