Commit Graph

5966 Commits

Author SHA1 Message Date
Aaron Teo 03ec5d3ed3
ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 18:14:44 +08:00
Aaron Teo 4cc62cb693
ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 18:10:14 +08:00
Aaron Teo 6f42570194
ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 18:08:47 +08:00
Aaron Teo eefa943b0a
ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 18:03:17 +08:00
Aaron Teo fc692ed498
ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 18:00:28 +08:00
Aaron Teo 08de84ef85
ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:57:57 +08:00
Aaron Teo 032dce5a6a
ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:46:17 +08:00
Aaron Teo cf0e190c40
ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:44:39 +08:00
Aaron Teo f239bbb02d
ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:38:44 +08:00
Aaron Teo 092fa3a328
ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:27:35 +08:00
Aaron Teo f7e8d6f2b2
ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:17:12 +08:00
Aaron Teo 6d71749c26
ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:10:07 +08:00
Aaron Teo 4b2f1cb1b8
ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 16:05:53 +08:00
Aaron Teo f800c80281
ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 15:59:52 +08:00
Aaron Teo bee7dd3020
ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 15:55:42 +08:00
Aaron Teo aef93b3908
ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-28 15:41:56 +08:00
Aaron Teo f263f5d9ae
ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 18:30:10 +08:00
Aaron Teo 1c75ed63e5
ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 18:22:34 +08:00
Aaron Teo a1d8568c14
ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 18:13:07 +08:00
Aaron Teo 59e9805ab0
ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 16:26:29 +08:00
Aaron Teo c1653ab639
ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 16:22:06 +08:00
Aaron Teo 828519659b
ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 16:11:37 +08:00
Aaron Teo 18658b8607
ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 12:02:20 +08:00
Aaron Teo da2e0e70ba
ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 02:31:22 +08:00
Aaron Teo 63fbc45ed6
ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 01:09:01 +08:00
Aaron Teo b7f4b6fde3
ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 01:03:53 +08:00
Aaron Teo ee0ed78d54
ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 00:56:32 +08:00
Aaron Teo 13c64448bd
ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 00:48:32 +08:00
Aaron Teo 13c05872f2
ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 00:44:05 +08:00
Aaron Teo 9e84742e72
ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 00:40:22 +08:00
Aaron Teo af9f4f0039
ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 00:25:41 +08:00
Aaron Teo ae2f656d7e
ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 00:18:53 +08:00
Aaron Teo 7c6395f826
ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-24 00:14:45 +08:00
Aaron Teo 04ddb2ac95
ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-23 19:51:37 +08:00
Aaron Teo 77a753297b
ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-23 19:28:51 +08:00
Aaron Teo 11d58d29de
ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-23 14:54:44 +08:00
Aaron Teo 529bdb9fbd
ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-22 00:29:47 +08:00
Aaron Teo 60b9874dea
ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 21:11:39 +08:00
Aaron Teo b9756b6dd4
ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 21:09:21 +08:00
Aaron Teo 1989fc9bf4
ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 20:37:53 +08:00
Aaron Teo 36d76c30fb
ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 20:30:54 +08:00
Aaron Teo 02cfcfb270
ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-21 20:19:20 +08:00
Aaron Teo fd4914b060
ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add layout name mapping, ztensor information

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: separate logging into its own line

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add shape comparison

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: add ggml_tensor shape log

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: fix incorrect shape logging

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-18 21:00:30 +08:00
Aaron Teo e084821a3f
ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: temp change z17 to arch15

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

ggml-zdnn: fix build bugs

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-18 20:04:20 +08:00
Georgi Gerganov 01612b7409
llama : reuse compute graphs (#14482)
* llama : reuse compute graphs

ggml-ci

* llama-bench : add graph reuse parameter

ggml-ci

* cont : remove the parameter and the sched resets

ggml-ci

* graph : rename update() to can_reuse()

ggml-ci

* params : remove is_same()

ggml-ci

* graph : set res->params in llm_graph_context constructor

ggml-ci

* graph : avoid set_max_nodes in llm_graph_result

ggml-ci

* kv-cache : reuse llama_context's graph result instance

ggml-ci

* context : reset the previous graph result upon memory updates

ggml-ci

* batch : llama_ubatch now carries its data instead of pointing to balloc

ggml-ci

* merge : fix build

ggml-ci

* graph : fix can_reuse() checks when flash-attention is disabled

* graph : move llm_graph_result impl in source file + debug env

ggml-ci
2025-07-17 19:08:33 +03:00
Tarek Dakhran 086cf81e88
llama : fix parallel processing for lfm2 (#14705) 2025-07-17 09:22:11 +02:00
Georgi Gerganov d9b691081c
kv-cache : opt mask set input (#14600)
ggml-ci
2025-07-17 09:49:15 +03:00
Georgi Gerganov ad57d3edd2
batch : fix uninitialized has_cpl flag (#14733)
ggml-ci
2025-07-17 09:45:54 +03:00
Sigbjørn Skjæret 1ba45d4982
ci : disable failing vulkan crossbuilds (#14723) 2025-07-16 20:52:08 -03:00
Sigbjørn Skjæret 19e5943d9e
convert : make hf token optional (#14717)
* make hf token optional

* fail if we can't get necessary tokenizer config
2025-07-16 23:17:43 +02:00