llama.cpp/tests
Jeff Bolz 1fa4551af0
vulkan: support larger argsort (#17313)
* vulkan: support larger argsort

This is an extension of the original bitonic sorting shader that puts the
temporary values in global memory and when more than 1024 threads are needed
it runs multiple workgroups and synchronizes through a pipelinebarrier.

To improve the memory access pattern, a copy of the float value is kept with
the index value. I've applied this same change to the original shared memory
version of the shader, which is still used when ncols <= 1024.

* Reduce the number of shader variants. Use smaller workgroups when doing a single pass, for a modest perf boost

* reduce loop overhead

* run multiple cols per invocation, to reduce barrier overhead
2025-11-19 17:25:50 +01:00
..
.gitignore gitignore : Ignore vim swap files in tests (#15901) 2025-09-10 14:28:47 +03:00
CMakeLists.txt devops: add s390x & ppc64le CI (#15925) 2025-09-27 02:03:33 +08:00
get-model.cpp ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
get-model.h ci : add model tests + script wrapper (#4586) 2024-01-26 14:18:00 +02:00
run-json-schema-to-grammar.mjs llama : move end-user examples to tools directory (#13249) 2025-05-02 20:27:13 +02:00
test-alloc.cpp ggml : fix graph reallocation with multiple chunks (#16396) 2025-10-03 13:49:08 +02:00
test-arg-parser.cpp common : remove common_has_curl() (#16351) 2025-09-30 17:39:44 +03:00
test-autorelease.cpp llama : add `llama_vocab`, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-backend-ops.cpp vulkan: support larger argsort (#17313) 2025-11-19 17:25:50 +01:00
test-barrier.cpp test-barrier : do not use more threads than physically available (#16389) 2025-10-02 20:10:12 +02:00
test-c.c ggml : remove kompute backend (#14501) 2025-07-03 07:48:32 +03:00
test-chat-parser.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
test-chat-template.cpp chat : Granite Docling stopping (#16438) 2025-10-06 18:59:40 +02:00
test-chat.cpp common : Generalized XML-style tool-call parsing with streaming support (GLM 4.5/4.6 + MiniMax M2 + SeedOSS + Kimi-K2 + Qwen3-Coder + Apriel-1.5 + Xiaomi-MiMo) (#16932) 2025-11-18 18:54:15 +01:00
test-double-float.cpp ggml : minor naming changes (#8433) 2024-07-12 10:46:02 +03:00
test-gbnf-validator.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-gguf.cpp gguf: fix failure on version == 0 (#13956) 2025-06-01 18:08:05 +02:00
test-grammar-integration.cpp grammar : use int64_t to avoid int overflows in int schema to grammar conversion logic (#16626) 2025-10-17 08:59:31 +03:00
test-grammar-llguidance.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-grammar-parser.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-json-partial.cpp common : handle unicode during partial json parsing (#16526) 2025-10-12 16:18:47 +03:00
test-json-schema-to-grammar.cpp grammar : support array references in json schema (#16792) 2025-10-28 09:37:52 +01:00
test-llama-grammar.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-log.cpp common : use common_ prefix for common library functions (#9805) 2024-10-10 22:57:42 +02:00
test-lora-conversion-inference.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
test-model-load-cancel.cpp llama : update llama_model API names (#11063) 2025-01-06 10:55:18 +02:00
test-mtmd-c-api.c mtmd : add C public API (#13184) 2025-05-04 23:43:42 +02:00
test-opt.cpp tests : fix test-opt with GGML_BACKEND_DL (#15599) 2025-08-26 22:14:38 +02:00
test-quantize-fns.cpp tests : fix test-quantize-fns to init the CPU backend (#12306) 2025-03-10 14:07:15 +02:00
test-quantize-perf.cpp ci: run the x64 and arm ci on the github machines instead (#16183) 2025-09-25 08:06:06 +03:00
test-quantize-stats.cpp docker : do not build tests (#13204) 2025-04-30 10:44:07 +02:00
test-regex-partial.cpp `common`: add partial regex support (#12808) 2025-05-14 19:50:57 +01:00
test-rope.cpp ggml-cpu: templateify ggml_compute_forward_rope_f32 and _f16 (#16805) 2025-11-11 13:33:24 +02:00
test-sampling.cpp sampling : optimize samplers by reusing bucket sort (#15665) 2025-08-31 20:41:02 +03:00
test-thread-safety.cpp server : support unified cache across slots (#16736) 2025-11-02 18:14:04 +02:00
test-tokenizer-0.cpp llama : add `llama_vocab`, functions -> methods, naming (#11110) 2025-01-12 11:32:42 +02:00
test-tokenizer-0.py py : logging and flake8 suppression refactoring (#7081) 2024-05-05 08:07:48 +03:00
test-tokenizer-0.sh scripts : make the shell scripts cross-platform (#14341) 2025-06-30 10:17:18 +02:00
test-tokenizer-1-bpe.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-tokenizer-1-spm.cpp cmake : do not include ./src as public for libllama (#13062) 2025-04-24 16:00:10 +03:00
test-tokenizer-random.py requirements : update transformers/torch for Embedding Gemma (#15828) 2025-09-09 06:06:52 +02:00
test-tokenizers-repo.sh devops: add s390x & ppc64le CI (#15925) 2025-09-27 02:03:33 +08:00