Switch oneshot handler to use AssistantResponse, inturn currenlty only handle the normal content in the response. TODO: If any tool_calls in the oneshot response, it is currently not handled. Inturn switch the generic/toplevel handle response logic to use AssistantResponse class, given that both oneshot and the multipart/streaming flows use/return it. Inturn add trimmedContent member to AssistantResponse class and make the generic handle response logic to save the trimmed content into this. Update users of trimmed to work with this structure. |
||
|---|---|---|
| .. | ||
| batched-bench | ||
| cvector-generator | ||
| export-lora | ||
| gguf-split | ||
| imatrix | ||
| llama-bench | ||
| main | ||
| mtmd | ||
| perplexity | ||
| quantize | ||
| rpc | ||
| run | ||
| server | ||
| tokenize | ||
| tts | ||
| CMakeLists.txt | ||