Commit Graph

6125 Commits

Author SHA1 Message Date
Aaron Teo e1fa4f2e1a
ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-14 23:36:32 +08:00
Aaron Teo 1746e0c78a
ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-14 23:35:23 +08:00
Aaron Teo c3d2096a9b
docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-14 14:41:02 +08:00
Aaron Teo e390415250
ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-14 14:35:59 +08:00
Aaron Teo fb0241bc78
ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-31 17:02:57 +08:00
Aaron Teo 6b6ebb9bee
ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-31 16:55:53 +08:00
Aaron Teo 732df731ba
ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-31 12:11:22 +08:00
Aaron Teo 12e6b8b65d
Merge branch 'master' into feat/backend-zdnn 2025-07-31 02:00:01 +08:00
Aaron Teo 867d3f325d
chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-31 01:48:55 +08:00
Aaron Teo cf8cdcd372
ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-31 01:26:30 +08:00
Daniel Bevenius 41e78c567e
server : add support for `embd_normalize` parameter (#14964)
This commit adds support for the `embd_normalize` parameter in the
server code.

The motivation for this is that currently if the server is started with
a pooling type that is not `none`, then Euclidean/L2 normalization will
be the normalization method used for embeddings. However, this is not
always the desired behavior, and users may want to use other
normalization (or none) and this commit allows that.

Example usage:
```console
curl --request POST \
    --url http://localhost:8080/embedding \
    --header "Content-Type: application/json" \
    --data '{"input": "Hello world today", "embd_normalize": -1}
```
2025-07-30 18:07:11 +02:00
uvos ad4a700117
HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (#14949) 2025-07-30 17:38:06 +02:00
Georgi Gerganov e32a4ec60e sync : ggml
ggml-ci
2025-07-30 17:33:11 +03:00
Kai Pastor e228de9449 cmake : Fix BLAS link interface (ggml/1316) 2025-07-30 17:33:11 +03:00
Kai Pastor 73a8e5ca03 vulkan : fix 32-bit builds (ggml/1313)
The pipeline member can be cast to VkPipeline.
This is a VkPipeline_T* on 64 bit but a uint64_t on 32 bit.
Cf. VK_DEFINE_NON_DISPATCHABLE_HANDLE documentation.
2025-07-30 17:33:11 +03:00
Johannes Gäßler 92b8810ec7
CUDA: skip masked KV slices for all FA kernels (#14924) 2025-07-30 15:46:13 +02:00
Georgi Gerganov 00131d6eaf
tests : update for LLAMA_SET_ROWS=1 (#14961)
* test-thread-safety : each context uses a single sequence

* embedding : handle --parallel argument

ggml-ci

* save-load : handle -np 1

ggml-ci

* thread-safety : avoid overriding threads, reduce test case arg

ggml-ci
2025-07-30 15:12:02 +03:00
Georgi Gerganov 1e15bfd42c
graph : fix stack-use-after-return (#14960)
ggml-ci
2025-07-30 13:52:11 +03:00
Aaron Teo 92a17ed9f3
ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:36:38 +08:00
Aaron Teo 90d460c20b
ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:34:15 +08:00
Aaron Teo e67feafc65
ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:27:49 +08:00
Aaron Teo 803dde3bbc
ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:23:36 +08:00
Aaron Teo 70224e6cb7
ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 17:21:04 +08:00
Aaron Teo 1eb7c35e3a
ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:57:14 +08:00
Aaron Teo b7a77cf683
ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:15:20 +08:00
Aaron Teo 4d5edb2221
ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:11:07 +08:00
Aaron Teo 20d69b6cdf
ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:05:58 +08:00
Aaron Teo 4fb6bee1f6
ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 16:04:19 +08:00
Aaron Teo 7b50d057dd
ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 15:33:13 +08:00
Aaron Teo ad0cb30212
ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:52:13 +08:00
Aaron Teo b4dffed954
ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:50:09 +08:00
Aaron Teo fd766bdd44
ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:40:36 +08:00
Aaron Teo e30b1ffbde
ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:34:47 +08:00
Aaron Teo 4493b148d0
ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:33:12 +08:00
Douglas Hanley a118d80233
embeddings: fix extraction of CLS pooling results (#14927)
* embeddings: fix extraction of CLS pooling results

* merge RANK pooling into CLS case for inputs
2025-07-30 08:25:05 +03:00
Aaron Teo 213f1d2a3f
ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:11:09 +08:00
Aaron Teo e695e8577d
ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-30 13:06:36 +08:00
Xinpeng Dou 61550f8231
CANN: update ops docs (#14935)
* CANN:add ops docs

* CANN: update ops docs
2025-07-30 08:39:24 +08:00
uvos aa79524c51
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (#14945) 2025-07-29 20:23:04 +02:00
uvos b77d11179d
HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (#14930)
This is useful for testing for regressions on GCN with CDNA hardware.

With GGML_HIP_MMQ_MFMA=Off and GGML_CUDA_FORCE_MMQ=On we can conveniently test the GCN code path on CDNA. As CDNA is just GCN renamed with MFMA added and limited use ACC registers, this provides a good alternative for regression testing when GCN hardware is not available.
2025-07-29 17:44:30 +02:00
uvos c7aa1364fd
HIP: Ignore unsupported unroll transformation in fattn-vec (#14931)
llvm with the amdgcn target dose not support unrolling loops with conditional break statements, when those statements can not be resolved at compile time. Similar to other places in GGML lets simply ignore this warning.
2025-07-29 17:43:43 +02:00
kallewoof 1a67fcc306
common : avoid logging partial messages (which can contain broken UTF-8 sequences) (#14937)
* bug-fix: don't attempt to log partial parsed messages to avoid crash due to unfinished UTF-8 sequences
2025-07-29 17:05:38 +02:00
hipudding 204f2cf168
CANN: Add ggml_set_rows (#14943) 2025-07-29 22:36:43 +08:00
Sigbjørn Skjæret 138b288b59
cuda : add softcap fusion (#14907) 2025-07-29 14:22:03 +02:00
Aaron Teo 8dbca74fc7
ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 17:03:58 +08:00
Johannes Gäßler bbd0f91779
server-bench: make seed choice configurable (#14929)
* server-bench: make seed choice configurable

* Update scripts/server-bench.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Update scripts/server-bench.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* fix error formatting

* Update scripts/server-bench.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-29 10:40:50 +02:00
Aaron Teo b1376ad051
ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 16:38:07 +08:00
Aaron Teo b28b423801
ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 15:55:33 +08:00
Aaron Teo 3446807452
ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 15:45:46 +08:00
Aaron Teo 2d45ee2536
ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-29 15:36:42 +08:00