Ed Addario
f184450806
Fix minor logic flaw
2025-09-22 20:10:42 +01:00
Ed Addario
1fbc59f867
Replace slope with cross product
2025-09-22 20:10:10 +01:00
Ed Addario
c855094dff
Exit loop if no better solution found
2025-09-22 20:09:11 +01:00
Ed Addario
b748a1efa7
Fix typo
2025-09-21 22:03:54 +01:00
Ed Addario
896cdc2121
Refactor potential overflow
2025-09-21 22:03:36 +01:00
Ed Addario
fecc472c61
Fix typos in variable names
2025-09-21 17:26:38 +01:00
Ed Addario
e92db008bc
Refactor quantisation checks into its own function
2025-09-21 17:20:48 +01:00
Ed Addario
814f6b66be
Minor general refactoring
2025-09-21 16:45:09 +01:00
Ed Addario
0d5f18303e
Refactor lagrange_penalty()
2025-09-21 16:22:00 +01:00
Ed Addario
9a1656eb97
Refactor pareto optimise and convexify
2025-09-21 16:21:35 +01:00
Ed Addario
1a3e9ea4c8
Refactor estimate_error()
2025-09-21 16:21:00 +01:00
Ed Addario
a7ee915e19
Refactor trimmed_sum()
2025-09-21 16:20:06 +01:00
Ed Addario
b09662f86a
Refactor estimate_lambda()
2025-09-21 16:19:49 +01:00
Ed Addario
17be7615ce
Refactor candidate types build
2025-09-21 16:19:28 +01:00
Ed Addario
08146fd67f
Refactor side_data() and copy_or_broadcast()
2025-09-21 16:19:03 +01:00
Ed Addario
7386d4eadd
Refactor row sampling
2025-09-21 16:18:26 +01:00
Ed Addario
b6c008fd8a
Refactor helper lambdas
2025-09-21 16:04:13 +01:00
Ed Addario
b433fd9547
Refactor last budget pass
2025-09-21 13:43:09 +01:00
Ed Addario
c466c53808
Refactor pareto pruning and convexification
2025-09-21 13:42:54 +01:00
Ed Addario
6b8cedf3bc
Refactor estimate_lambda()
2025-09-21 13:42:31 +01:00
Ed Addario
bdefdb673c
Refactor copy_or_broadcast()
2025-09-21 13:42:07 +01:00
Ed Addario
e8e2aed17a
Refactor row sampling
2025-09-21 13:41:44 +01:00
Ed Addario
9e74f83411
Replace --bpw-bias flag with --no-bias
2025-09-20 23:06:37 +01:00
Ed Addario
ab02bb1f3e
Merge branch 'master' into quantize
2025-09-20 21:41:25 +01:00
Ed Addario
a36946997e
Replace fast_bias() for per slice version and remove precise_bias()
2025-09-20 21:36:54 +01:00
Ed Addario
14fae69a7b
General refactoring
2025-09-20 21:31:31 +01:00
Georgi Gerganov
e58174cecb
llama : bump max seq limit from 64 to 256 ( #15916 )
...
ggml-ci
2025-09-18 12:47:56 +03:00
Xuan-Son Nguyen
8f8f2274ee
convert : add Llama4ForCausalLM ( #16042 )
...
* convert : add Llama4ForCausalLM
* handle swa
* half working version
* fix use_kq_norm
* fix use_kq_norm
2025-09-17 19:18:21 +02:00
Jie Fu (傅杰)
745cbcf2fe
llama-quant : fix the verification of attention layers for encoder-decoder models ( #16023 )
...
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-09-17 09:30:55 +02:00
Shane A
85286f3548
model : add OLMo3 support ( #16015 )
...
* Add HF to gguf conversion logic for Olmo3
* Add Olmo3 implementation
* Update rope comment
* Fix indentation
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Apply suggestion from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-17 09:01:58 +02:00
Aman Gupta
6d758839ff
Add LLaDA-7b-MoE diffusion model ( #16003 )
2025-09-16 10:38:28 +08:00
Ed Addario
ad70fca5b2
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize
2025-09-15 07:42:37 +01:00
Ed Addario
9b857e3984
Merge branch 'ggml-org:master' into quantize
2025-09-14 23:35:43 +01:00
Ed Addario
c709e1a335
Fix MoE tensor estimation
2025-09-14 22:38:27 +01:00
Sigbjørn Skjæret
b8e09f08b9
model : add grok-2 support ( #15539 )
...
* add grok-2 support
* type fix
* type fix
* type fix
* "fix" vocab for invalid sequences
* fix expert tensor mapping and spaces in vocab
* add chat template
* fix norm tensor mapping
* rename layer_out_norm to ffn_post_norm
* ensure ffn_post_norm is mapped
* fix experts merging
* remove erroneous FFN_GATE entry
* concatenate split tensors and add more metadata
* process all expert layers and try cat instead of hstack
* add support for community BPE vocab
* fix expert feed forward length and ffn_down concat
* commit this too
* add ffn_up/gate/down, unsure if sequence is right
* add ffn_gate/down/up to tensor names
* correct residual moe (still not working)
* mess--
* fix embedding scale being applied twice
* add built in chat template
* change beta fast for grok if default value
* remove spm vocab in favor of community bpe vocab
* change attention temp length metadata type to integer
* update attention temp length metadata
* remove comment
* replace M_SQRT2 with std::sqrt(2)
* add yarn metadata, move defaults to hparams
2025-09-14 23:00:59 +02:00
Ed Addario
8503d59ee4
Increase IQ options
2025-09-13 11:49:18 +01:00
Ed Addario
2b516068e2
"Convexify" candidate list
2025-09-13 09:41:52 +01:00
Ed Addario
12e816b511
Replace greedy allocator with lagrangian relaxation
2025-09-13 09:24:23 +01:00
Ed Addario
7d85993f26
Minor refactoring
2025-09-13 08:44:41 +01:00
Ed Addario
4dff85fbe5
Improve precise_lambda() efficiency
2025-09-13 08:41:37 +01:00
Ed Addario
bc8762f27f
Capture surrounding function name
2025-09-13 08:33:22 +01:00
Ed Addario
886536d80a
Increase error type precision
2025-09-13 08:27:23 +01:00
Haiyue Wang
f4e664f838
context : remove redundant explicit casting to the same type ( #15948 )
...
The function 'output_reserve' return type is 'uint32_t', so need to add
explicit casting.
2025-09-12 18:16:32 +03:00
Diego Devesa
360d6533db
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type ( #15797 )
...
* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type
ggml-backend : add device id to device props
llama : only use iGPU devices if there are no GPU devices
llama : do not use multiple devices from different backends with the same device id
2025-09-11 22:47:38 +02:00
ddh0
df082f5630
nitpick : correct MB to MiB ( #15934 )
...
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
2025-09-11 19:12:34 +02:00
Ed Addario
f0f07bd6ba
Merge branch 'master' into quantize
2025-09-10 21:16:04 +01:00
Jie Fu (傅杰)
4f658855fa
llama : support T5 models with unequal number of encoder-decoder layers ( #15909 )
...
* Extend the support of T5 models with different encoder-decoder layers
Signed-off-by: Jie Fu <jiefu@tencent.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update gguf-py/gguf/constants.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update gguf-py/gguf/gguf_writer.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-arch.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-arch.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-hparams.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Rename n_dec_layer --> dec_n_layer
Signed-off-by: Jie Fu <jiefu@tencent.com>
* Adapt to cases when dec_n_layer > n_layer
Signed-off-by: Jie Fu <jiefu@tencent.com>
---------
Signed-off-by: Jie Fu <jiefu@tencent.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-10 20:51:51 +02:00
Sigbjørn Skjæret
6ab397e12b
graph : support non-contiguous Q in build_attn_mha ( #15908 )
...
* support non-contiguous Q in build_attn_mha
* Update src/llama-graph.cpp
ggml-ci
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-09-10 19:08:59 +02:00
Ed Addario
04c07b3272
Add better control over MSE and directional bias computation
2025-09-10 18:00:56 +01:00
Daniel Bevenius
86587da03b
llama : check returned fn ptrs from ggml_backend_reg_get_proc_address ( #15893 )
...
This commit adds check for two function pointers returned from
ggml_backend_reg_get_proc_address.
The motivation for this is that the function pointer could be nullptr if
the get proc address function changes in the future. This is also
consistent with all the other calls to ggml_backend_reg_get_proc_address
in the code base.
2025-09-10 05:33:58 +02:00