llama.cpp

History

Ryan Goulden 26c9ce1288 server: Add cached_tokens info to oaicompat responses (#19361 ) * tests : fix fetch_server_test_models.py * server: to_json_oaicompat cached_tokens Adds OpenAI and Anthropic compatible information about the number of cached prompt tokens used in a response.		2026-03-19 19:09:33 +01:00
..
apple	…
hip	ci : add hip quality check (#20430 )	2026-03-19 17:05:44 +01:00
jinja	…
snapdragon	hexagon: add Matrix Extensions (HMX) for Hexagon NPU backend (#20693 )	2026-03-19 09:11:06 -07:00
bench-models.sh	benches : update models + numbers (#19359 )	2026-02-05 14:34:07 +02:00
build-info.sh	…
check-requirements.sh	…
compare-commits.sh	…
compare-llama-bench.py	compare-llama-bench: check remotes as well (#20406 )	2026-03-12 00:14:42 +08:00
compare-logprobs.py	scripts: update corpus of compare-logprobs (#19326 )	2026-02-25 12:57:34 +01:00
create_ops_docs.py	…
debug-test.sh	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
fetch_server_test_models.py	server: Add cached_tokens info to oaicompat responses (#19361 )	2026-03-19 19:09:33 +01:00
gen-authors.sh	…
gen-unicode-data.py	…
get-flags.mk	…
get-hellaswag.sh	scripts : update get-hellaswag.sh and get-winogrande.sh (#20542 )	2026-03-14 11:21:50 +01:00
get-pg.sh	…
get-wikitext-2.sh	scripts : improve get-wikitext-2.sh (#19952 )	2026-03-02 15:40:49 +01:00
get-winogrande.sh	scripts : update get-hellaswag.sh and get-winogrande.sh (#20542 )	2026-03-14 11:21:50 +01:00
get_chat_template.py	…
git-bisect-run.sh	llama: end-to-end tests (#19802 )	2026-03-08 12:30:21 +01:00
git-bisect.sh	llama: end-to-end tests (#19802 )	2026-03-08 12:30:21 +01:00
hf.sh	…
install-oneapi.bat	…
pr2wt.sh	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
serve-static.js	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
server-bench.py	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
server-test-model.py	Autoparser - complete refactoring of parser architecture (#18675 )	2026-03-06 21:01:00 +01:00
sync-ggml-am.sh	…
sync-ggml.last	sync : ggml	2026-03-18 15:17:28 +02:00
sync-ggml.sh	…
sync_vendor.py	vendor : update cpp-httplib to 0.38.0 (#20578 )	2026-03-15 17:30:06 +01:00
tool_bench.py	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
tool_bench.sh	…
verify-checksum-models.py	…
xxd.cmake	…