Commit Graph

5920 Commits

Author SHA1 Message Date
ibrahimkhadraoui 97011d7a1f mup_vec create as float64 2025-07-07 14:25:32 +04:00
ibrahimkhadraoui 49d7420964 inp_out_ids moved outside of layers loop 2025-07-07 14:18:48 +04:00
ibrahimkhadraoui 8c50893820 added some cb functions for debugging puposes 2025-07-07 14:10:45 +04:00
Younes B 6c39e775dd
fix conversion and d_inner 2025-07-07 10:56:49 +02:00
ibrahimkhadraoui 441d8d66bd override modify_tensors instead of get_tensors 2025-07-07 12:00:57 +04:00
ibrahimkhadraoui 53304c84db remove unused functions from gguf_writer.py 2025-07-07 11:18:14 +04:00
ibrahimkhadraoui c4af0f3ca5 mamba_d_ssm added to d_inner find_hparam 2025-07-07 11:17:31 +04:00
ibrahimkhadraoui c56ec07a9a read arch from gguf.MODEL_ARCH 2025-07-07 10:34:46 +04:00
ibrahimkhadraoui 280dd2dcb7 falcon-h1 specefic vocab resolved 2025-07-07 10:25:57 +04:00
Eve 6491d6e4f1
vulkan: increase LOAD_VEC_A to 8 (IQ1/IQ2) or 4 (IQ3) (#14485)
Commit taken from remyoudompheng's PR https://github.com/ggml-org/llama.cpp/pull/12260

Co-authored-by: Rémy Oudompheng <remyoudompheng@gmail.com>
2025-07-06 12:29:36 +02:00
Jeff Bolz e592be1575
vulkan: fix rms_norm+mul fusion (#14545)
The fused operation was grabbing the epsilon value from the wrong place.

Add an env var to disable fusion.

Add some missing checks for supported shapes/types.

Handle fused rms_norm+mul in check_results.
2025-07-06 10:08:16 +02:00
Jeff Bolz a0374a67e2
vulkan: Handle updated FA dim2/3 definition (#14518)
* vulkan: Handle updated FA dim2/3 definition

Pack mask boolean and n_head_log2 into a single dword to keep the push
constant block under the 128B limit.

* handle null mask for gqa

* allow gqa with dim3>1
2025-07-05 09:26:04 +02:00
Sigbjørn Skjæret ddef99522d
server : fix assistant prefilling when content is an array (#14360) 2025-07-05 09:17:14 +02:00
Sigbjørn Skjæret 6681688146
opencl: add GELU_ERF (#14476) 2025-07-04 23:24:56 -07:00
Georgi Gerganov bac8bed248
eval-callback : check for empty input (#14539) 2025-07-05 07:18:09 +03:00
R0CKSTAR b81510a7b7
test-backend-ops: add support for specifying output format (#14368)
* test-backend-ops: add support for specifying output format

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Add build_commit and build_number in test_result

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* refactor

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Get build commit from ggml_commit()

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Merge errors into test_operation_info && address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* remove visitor nonsense

* remove visitor comment

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

* Address review comments

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>

---------

Signed-off-by: Xiaodong Ye <yeahdongcn@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2025-07-05 12:10:53 +08:00
Georgi Gerganov ef797db357
metal : disable fast math in all quantize kernels (#14528)
ggml-ci
2025-07-04 19:19:09 +03:00
ibrahimkhadraoui 7a25441e13 fixed multipliers 2025-07-04 17:41:03 +04:00
ibrahimkhadraoui 9760c8bc9d conflict solve 2025-07-04 16:28:48 +04:00
ibrahimkhadraoui 2aa48dd853 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-04 16:25:54 +04:00
ibrahimkhadraoui 3ee7983961 fix vocab size 2025-07-04 16:25:27 +04:00
younesbelkada 250b4f1074 mix instead of max 2025-07-04 15:53:47 +04:00
younesbelkada 1fd0574adc try 2025-07-04 15:50:43 +04:00
ibrahimkhadraoui a6d0067dd7 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-04 15:37:44 +04:00
ibrahimkhadraoui 15138df48f small fix ffn_norm 2025-07-04 15:37:40 +04:00
younesbelkada 6c7d9e26e7 fix 2025-07-04 15:25:59 +04:00
ibrahimkhadraoui d22b4ea425 Merge branch 'add-fh1-rebased' of https://github.com/tiiuae/llama.cpp-public into add-fh1-rebased 2025-07-04 15:10:11 +04:00
ibrahimkhadraoui 2fe057cc40 Revert "fix"
This reverts commit 243e4d1a50.
2025-07-04 15:04:13 +04:00
younesbelkada 22de62cf56 fix 2025-07-04 15:02:14 +04:00
younesbelkada cce35498d5 pre-norm -> norm 2025-07-04 14:58:33 +04:00
younesbelkada 243e4d1a50 fix 2025-07-04 14:55:31 +04:00
younesbelkada 1415cd8782 another fix 2025-07-04 14:49:59 +04:00
younesbelkada a39a8423f7 merge 2025-07-04 14:48:22 +04:00
younesbelkada 50eadc7b33 fixes 2025-07-04 14:47:31 +04:00
ibrahimkhadraoui 071f4b7fd8 changed precision for multipliers float 32->64 2025-07-04 14:37:02 +04:00
ibrahimkhadraoui 8bea92261e python fixes 2025-07-04 14:32:11 +04:00
Georgi Gerganov 67d1ef23c6
batch : add optional for sequential equal split (#14511)
ggml-ci
2025-07-04 09:08:59 +03:00
Georgi Gerganov 7b50f7c025
graph : prepare for 4D mask (#14515)
ggml-ci
2025-07-04 09:05:36 +03:00
Georgi Gerganov c79184d2d1
batch : add n_used count (#14512)
ggml-ci
2025-07-04 09:04:59 +03:00
luyhcsu 499a8f5a78
CANN: Replace aclrtMemsetSync with aclnnInplaceZero operator (#14002)
Co-authored-by: luyuhong <luyuhong@kylinos.cn>
2025-07-04 11:50:07 +08:00
Sigbjørn Skjæret 28657a8229
ggml : implement GEGLU_ERF and GEGLU_QUICK ops (#14445) 2025-07-03 23:07:22 +02:00
lhez bee28421be
opencl : broadcast for soft_max (#14510) 2025-07-03 20:22:24 +02:00
Jeff Bolz 2b72bedec1
vulkan: support mixed/deepseekR1 FA head sizes (#14509)
* vulkan: better parameterize FA by head sizes

* vulkan: support mixed/deepseekR1 FA head sizes
2025-07-03 20:21:14 +02:00
Johannes Gäßler c8c4495b8d
ggml: backward pass for split swiglu (#14483) 2025-07-03 17:05:18 +02:00
younesbelkada 14c37ec047 more cleaning on python code 2025-07-03 18:09:30 +04:00
younesbelkada fdd5cff4ba minor fix 2025-07-03 17:12:05 +04:00
younesbelkada 0c93ef6a9c more fixes 2025-07-03 15:26:33 +04:00
younesbelkada 03568c9358 fix 2025-07-03 15:10:18 +04:00
younesbelkada 71a6848e2d another fix 2025-07-03 15:08:23 +04:00
younesbelkada f897efdaf6 push more fixes 2025-07-03 15:05:01 +04:00