llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aaron Teo	2d45ee2536	ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 15:36:42 +08:00
Aman Gupta	0a5036bee9	CUDA: add roll (#14919 ) * CUDA: add roll * Make everything const, use __restrict__	2025-07-29 14:45:18 +08:00
Aaron Teo	ab60ae6ca2	ggml-zdnn: add zdnn_init call for static libs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:55:44 +08:00
lhez	8ad7b3e65b	opencl : add ops docs (#14910 )	2025-07-28 18:50:17 +02:00
Aaron Teo	0ae2d30302	ggml-zdnn: add nnpa installed detection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:39:55 +08:00
Aaron Teo	a9438925f2	ggml-zdnn: add parmblkformat detections Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:36:55 +08:00
Aaron Teo	1c6ca76c2e	ggml-zdnn: remove free_buffer debug info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:27:16 +08:00
Aaron Teo	1a0520a540	ggml-zdnn: add logging to debug free buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:12:18 +08:00
Aaron Teo	2872276d8a	ggml-zdnn: fix invalid ztensor buffer release Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 00:09:00 +08:00
Leonard Mosescu	bda62193b2	test-backend-ops : extend test case filtering (#14865 ) * Extend test case filtering 1. Allow passing multiple (comma-separated?) ops to test-backend-ops. This can be convenient when working on a set of ops, when you'd want to test them together (but without having to run every single op). For example: `test-backend-ops.exe test -o "ADD,RMS_NORM,ROPE,SILU,SOFT_MAX"` 2. Support full test-case variation string in addition to basic op names. This would make it easy to select a single variation, either for testing or for benchmarking. It can be particularly useful for profiling a particular variation (ex. a CUDA kernel), for example: `test-backend-ops.exe perf -b CUDA0 -o "MUL_MAT(type_a=f16,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3],v=2)"` These two can be combined. As the current `-o`, this change doesn't try to detect/report an error if an filter doesn't name existing ops (ex. misspelled) * Updating the usage help text * Update tests/test-backend-ops.cpp	2025-07-28 18:04:27 +02:00
Radoslav Gerganov	c556418b60	llama-bench : use local GPUs along with RPC servers (#14917 ) Currently if RPC servers are specified with '--rpc' and there is a local GPU available (e.g. CUDA), the benchmark will be performed only on the RPC device(s) but the backend result column will say "CUDA,RPC" which is incorrect. This patch is adding all local GPU devices and makes llama-bench consistent with llama-cli.	2025-07-28 18:59:04 +03:00
Aaron Teo	2cfa118fa9	ggml-zdnn: fix missing load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 23:42:24 +08:00
xctan	db16e2831c	ggml-cpu : deduplicate scalar implementations (#14897 ) * remove redundant code in riscv * remove redundant code in arm * remove redundant code in loongarch * remove redundant code in ppc * remove redundant code in s390 * remove redundant code in wasm * remove redundant code in x86 * remove fallback headers * fix x86 ggml_vec_dot_q8_0_q8_0	2025-07-28 17:40:24 +02:00
Aaron Teo	fc9260deab	ggml-zdnn: attempt to fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 23:37:50 +08:00
Aaron Teo	e0549c2925	ggml-zdnn: fix missing vector import in header Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 23:33:37 +08:00
Aaron Teo	f99b274cac	ggml-zdnn: fix missing vector import Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 23:30:48 +08:00
Aaron Teo	0905168388	ggml-zdnn: rewrite into mre Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 23:26:15 +08:00
Akarshan Biswas	cd1fce6d4f	SYCL: Add set_rows support for quantized types (#14883 ) * SYCL: Add set_rows support for quantized types This commit adds support for GGML_OP_SET_ROWS operation for various quantized tensor types (Q8_0, Q5_1, Q5_0, Q4_1, Q4_0, IQ4_NL) and BF16 type in the SYCL backend. The quantization/dequantization copy kernels were moved from cpy.cpp to cpy.hpp to make them available for set_rows.cpp. This addresses part of the TODOs mentioned in the code. * Use get_global_linear_id() instead ggml-ci * Fix formatting ggml-ci * Use const for ne11 and size_t variables in set_rows_sycl_q ggml-ci * Increase block size for q kernel to 256 ggml-ci * Cleanup imports * Add float.h to cpy.hpp	2025-07-28 20:32:15 +05:30
Xuan-Son Nguyen	00fa15fedc	mtmd : add support for Voxtral (#14862 ) * mtmd : add support for Voxtral * clean up * fix python requirements * add [BEGIN_AUDIO] token * also support Devstral conversion * add docs and tests * fix regression for ultravox * minor coding style improvement * correct project activation fn * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-28 15:01:48 +02:00
Johannes Gäßler	946b1f6859	CUDA: fix pointer incrementation in FA (#14916 )	2025-07-28 14:30:22 +02:00
Dongliang Wei	6c6e397aff	model : add support for SmallThinker series (#14898 ) * support smallthinker * support 20b softmax, 4b no sliding window * new build_moe_ffn_from_probs, and can run 4b * fix 4b rope bug * fix python type check * remove is_moe judge * remove set_dense_start_swa_pattern function and modify set_swa_pattern function * trim trailing whitespace * remove get_vocab_base of SmallThinkerModel in convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * better whitespace Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * use GGML_ASSERT for expert count validation Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Improve null pointer check for probs Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * use template parameter for SWA attention logic * better whitespace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * move the creation of inp_out_ids before the layer loop * remove redundant judge for probs --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-07-28 13:47:00 +02:00
Aaron Teo	03ec5d3ed3	ggml-zdnn: bring back working matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 18:14:44 +08:00
Aaron Teo	4cc62cb693	ggml-zdnn: move bias data to local also Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 18:10:14 +08:00
Aaron Teo	6f42570194	ggml-zdnn: move everything back to local declaration Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 18:08:47 +08:00
Alberto Cabrera Pérez	afc0e89698	sycl: refactor quantization to q8_1 (#14815 ) * sycl: quantization to q8_1 refactor * Refactored src1 copy logic in op_mul_mat	2025-07-28 11:05:53 +01:00
Aaron Teo	eefa943b0a	ggml-zdnn: fix sigsegv Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 18:03:17 +08:00
Aaron Teo	fc692ed498	ggml-zdnn: figure out why sigtrap is happening Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 18:00:28 +08:00
Aaron Teo	08de84ef85	ggml-zdnn: bugfix transform ztensor vs origtensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:57:57 +08:00
Aaron Teo	032dce5a6a	ggml-zdnn: fix sequencing of transforms Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:46:17 +08:00
Aaron Teo	cf0e190c40	ggml-zdnn: add more safeguards in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:44:39 +08:00
Aaron Teo	f239bbb02d	ggml-zdnn: move weights transform into mulmat Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:38:44 +08:00
Aaron Teo	092fa3a328	ggml-zdnn: activate bias transform in matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:27:35 +08:00
Aaron Teo	f7e8d6f2b2	ggml-zdnn: add logger to check if mat mul ops go through set_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:17:12 +08:00
Aaron Teo	6d71749c26	ggml-zdnn: add more debug info for extra buffer transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:10:07 +08:00
Aaron Teo	4b2f1cb1b8	ggml-zdnn: add bias data transform Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 16:05:53 +08:00
Georgi Gerganov	a5771c9eea	ops : update BLAS (#14914 )	2025-07-28 10:01:03 +02:00
Aaron Teo	f800c80281	ggml-zdnn: add bias ztensor and data free Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 15:59:52 +08:00
Aaron Teo	bee7dd3020	ggml-zdnn: tighten memory usage, change string allocation Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 15:55:42 +08:00
Aaron Teo	aef93b3908	ggml-zdnn: add bias init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-28 15:41:56 +08:00
Georgi Gerganov	c35f9eaf09	ops : update Metal (#14912 )	2025-07-28 08:22:56 +03:00
Georgi Gerganov	1f45f2890e	sync : ggml	2025-07-28 08:15:01 +03:00
Kai Pastor	613c5095c3	cmake : Indent ggml-config.cmake (ggml/1310)	2025-07-28 08:15:01 +03:00
Ed Addario	7f97599581	quantize : update README.md (#14905 ) * Update README.md * Fix trailing whitespace * Update README.md Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-27 23:31:11 +02:00
Ruben Ortlam	bf78f5439e	vulkan: add ops docs (#14900 )	2025-07-27 15:33:08 +02:00
Akarshan Biswas	bbfc849274	SYCL: add ops doc (#14901 )	2025-07-27 17:52:58 +05:30
Daniel Bevenius	ca0ef2dddb	llama : clarify comment about pp and tg graphs [no ci] (#14895 ) * llama : clarify comment about pp and tg graphs [no ci] This commit clarifies the comment in `llama-context.cpp` regarding the prefill prompt (pp), and token generation (tg) graphs. The motivation for this is that I've struggled to remember these and had to look them up more than once, so I thought it would be helpful to add a comment that makes it clear what these stand for. * squash! llama : clarify comment about pp and tg graphs [no ci] Change "pp" to "prompt processing".	2025-07-27 12:10:51 +02:00
Erik Scholz	89d1029559	vulkan : add fp16 support for the conv_2d kernel (#14872 ) * add f16 to conv_2d testing * weaken conv2d test error threshold	2025-07-27 12:04:33 +02:00
Jeff Bolz	f1a4e72de5	vulkan: skip empty set_rows to avoid invalid API usage (#14860 )	2025-07-27 11:05:34 +02:00
Gabriel Larson	4762ad7316	model : make rope_yarn_log_mul optional for deepseek2 (#14896 ) * make rope_yarn_log_mul optional for deepseek2 * default rope_yarn_log_mul = 0.0f	2025-07-27 11:18:37 +03:00
Shunta Saito	1dc9614e06	llama : fix kq_scale for the attention layers of PLaMo2 (#14892 ) * Fix dimensions for expand * Change dimensions to copy states to cache * Fix the default value for plamo2 conversion * Fix scale given to build_attn * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-27 09:38:44 +02:00

1 2 3 4 5 ...

6126 Commits All Branches Search

6126 Commits

All Branches