As the tool calling, if enabled, will need access to last few user query and ai assistant responses (which will also include in them the tool call requests and the corresponding results), so that the model can build answers based on its tool call reqs and got responses, and also given that most of the models these days have sufficiently large context windows, so the sliding window context implemented by SimpleChat logic has been increased by default to include last 4 query and their responses roughlty. |
||
|---|---|---|
| .. | ||
| batched-bench | ||
| cvector-generator | ||
| export-lora | ||
| gguf-split | ||
| imatrix | ||
| llama-bench | ||
| main | ||
| mtmd | ||
| perplexity | ||
| quantize | ||
| rpc | ||
| run | ||
| server | ||
| tokenize | ||
| tts | ||
| CMakeLists.txt | ||