llama.cpp

Commit Graph

Author	SHA1	Message	Date
Aaron Teo	e1fa4f2e1a	ggml-zdnn: fix typo in build-s390x.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-08-14 23:36:32 +08:00
Aaron Teo	1746e0c78a	ggml-zdnn: redo test-backend-ops for ops.md Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-08-14 23:35:23 +08:00
Aaron Teo	c3d2096a9b	docs: update ops docs for zdnn Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-08-14 14:41:02 +08:00
Aaron Teo	e390415250	ggml-zdnn: fix pr comments Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-08-14 14:35:59 +08:00
Aaron Teo	fb0241bc78	ggml-zdnn: deny all view tensors directly Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-31 17:02:57 +08:00
Aaron Teo	6b6ebb9bee	ggml-zdnn: attempt at fixing tensor views during matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-31 16:55:53 +08:00
Aaron Teo	732df731ba	ggml-zdnn: disable batched matmul Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-31 12:11:22 +08:00
Aaron Teo	12e6b8b65d	Merge branch 'master' into feat/backend-zdnn	2025-07-31 02:00:01 +08:00
Aaron Teo	867d3f325d	chore: add codeowners Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-31 01:48:55 +08:00
Aaron Teo	cf8cdcd372	ggml-zdnn: update documentation, prepare for upstream Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-31 01:26:30 +08:00
Daniel Bevenius	41e78c567e	server : add support for `embd_normalize` parameter (#14964 ) This commit adds support for the `embd_normalize` parameter in the server code. The motivation for this is that currently if the server is started with a pooling type that is not `none`, then Euclidean/L2 normalization will be the normalization method used for embeddings. However, this is not always the desired behavior, and users may want to use other normalization (or none) and this commit allows that. Example usage: ```console curl --request POST \ --url http://localhost:8080/embedding \ --header "Content-Type: application/json" \ --data '{"input": "Hello world today", "embd_normalize": -1} ```	2025-07-30 18:07:11 +02:00
uvos	ad4a700117	HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (#14949 )	2025-07-30 17:38:06 +02:00
Georgi Gerganov	e32a4ec60e	sync : ggml ggml-ci	2025-07-30 17:33:11 +03:00
Kai Pastor	e228de9449	cmake : Fix BLAS link interface (ggml/1316)	2025-07-30 17:33:11 +03:00
Kai Pastor	73a8e5ca03	vulkan : fix 32-bit builds (ggml/1313) The pipeline member can be cast to VkPipeline. This is a VkPipeline_T* on 64 bit but a uint64_t on 32 bit. Cf. VK_DEFINE_NON_DISPATCHABLE_HANDLE documentation.	2025-07-30 17:33:11 +03:00
Johannes Gäßler	92b8810ec7	CUDA: skip masked KV slices for all FA kernels (#14924 )	2025-07-30 15:46:13 +02:00
Georgi Gerganov	00131d6eaf	tests : update for LLAMA_SET_ROWS=1 (#14961 ) * test-thread-safety : each context uses a single sequence * embedding : handle --parallel argument ggml-ci * save-load : handle -np 1 ggml-ci * thread-safety : avoid overriding threads, reduce test case arg ggml-ci	2025-07-30 15:12:02 +03:00
Georgi Gerganov	1e15bfd42c	graph : fix stack-use-after-return (#14960 ) ggml-ci	2025-07-30 13:52:11 +03:00
Aaron Teo	92a17ed9f3	ggml-zdnn: clean up project structure Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:36:38 +08:00
Aaron Teo	90d460c20b	ggml-zdnn: clean up matmul selection Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:34:15 +08:00
Aaron Teo	e67feafc65	ggml-zdnn: fix ztensor deallocation abort stabilise ggml <-> zdnn api Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:27:49 +08:00
Aaron Teo	803dde3bbc	ggml-zdnn: code clean up Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:23:36 +08:00
Aaron Teo	70224e6cb7	ggml-zdnn: bring load ztensor back to init routine Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 17:21:04 +08:00
Aaron Teo	1eb7c35e3a	ggml-zdnn: code cleanup Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:57:14 +08:00
Aaron Teo	b7a77cf683	ggml-zdnn: add guards to prevent loading ztensor if transformed Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:15:20 +08:00
Aaron Teo	4d5edb2221	ggml-zdnn: fix errorenous output load tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:11:07 +08:00
Aaron Teo	20d69b6cdf	ggml-zdnn: disable global load ztensor for now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:05:58 +08:00
Aaron Teo	4fb6bee1f6	ggml-zdnn: attempt at using default nwhc format instead Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 16:04:19 +08:00
Aaron Teo	7b50d057dd	ggml-zdnn: attempt at manually changing the layout Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 15:33:13 +08:00
Aaron Teo	ad0cb30212	ggml-zdnn: disable logging and breakpoints for full test Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:52:13 +08:00
Aaron Teo	b4dffed954	ggml-zdnn: work on moving output ztensor as well Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:50:09 +08:00
Aaron Teo	fd766bdd44	ggml-zdnn: load ztensors in cgraph exec Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:40:36 +08:00
Aaron Teo	e30b1ffbde	ggml-zdnn: fix missing return from init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:34:47 +08:00
Aaron Teo	4493b148d0	ggml-zdnn: disable op_none initialisation for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:33:12 +08:00
Douglas Hanley	a118d80233	embeddings: fix extraction of CLS pooling results (#14927 ) * embeddings: fix extraction of CLS pooling results * merge RANK pooling into CLS case for inputs	2025-07-30 08:25:05 +03:00
Aaron Teo	213f1d2a3f	ggml-zdnn: add inputs logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:11:09 +08:00
Aaron Teo	e695e8577d	ggml-zdnn: add tensor to pre_tfm_desc logging Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-30 13:06:36 +08:00
Xinpeng Dou	61550f8231	CANN: update ops docs (#14935 ) * CANN:add ops docs * CANN: update ops docs	2025-07-30 08:39:24 +08:00
uvos	aa79524c51	HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (#14945 )	2025-07-29 20:23:04 +02:00
uvos	b77d11179d	HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (#14930 ) This is useful for testing for regressions on GCN with CDNA hardware. With GGML_HIP_MMQ_MFMA=Off and GGML_CUDA_FORCE_MMQ=On we can conveniently test the GCN code path on CDNA. As CDNA is just GCN renamed with MFMA added and limited use ACC registers, this provides a good alternative for regression testing when GCN hardware is not available.	2025-07-29 17:44:30 +02:00
uvos	c7aa1364fd	HIP: Ignore unsupported unroll transformation in fattn-vec (#14931 ) llvm with the amdgcn target dose not support unrolling loops with conditional break statements, when those statements can not be resolved at compile time. Similar to other places in GGML lets simply ignore this warning.	2025-07-29 17:43:43 +02:00
kallewoof	1a67fcc306	common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937 ) * bug-fix: don't attempt to log partial parsed messages to avoid crash due to unfinished UTF-8 sequences	2025-07-29 17:05:38 +02:00
hipudding	204f2cf168	CANN: Add ggml_set_rows (#14943 )	2025-07-29 22:36:43 +08:00
Sigbjørn Skjæret	138b288b59	cuda : add softcap fusion (#14907 )	2025-07-29 14:22:03 +02:00
Aaron Teo	8dbca74fc7	ggml-zdnn: attempt to use unique ptr Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 17:03:58 +08:00
Johannes Gäßler	bbd0f91779	server-bench: make seed choice configurable (#14929 ) * server-bench: make seed choice configurable * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix error formatting * Update scripts/server-bench.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-07-29 10:40:50 +02:00
Aaron Teo	b1376ad051	ggml-zdnn: add weights logging to check Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 16:38:07 +08:00
Aaron Teo	b28b423801	ggml-zdnn: switch to using deque to fix pointer deref problem Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 15:55:33 +08:00
Aaron Teo	3446807452	ggml-zdnn: attempt at fixing invalid buffer Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 15:45:46 +08:00
Aaron Teo	2d45ee2536	ggml-zdnn: add init_tensor Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>	2025-07-29 15:36:42 +08:00

1 2 3 4 5 ...

6125 Commits All Branches Search

6125 Commits

All Branches