Commit Graph

686 Commits

Author SHA1 Message Date
Ed Addario a74b410f5f
Move is_iq() into a lambda and remove unused variables 2025-09-25 19:49:47 +01:00
Ed Addario 8eedcf74bc
Increase scale multiplier 2025-09-22 20:42:37 +01:00
Ed Addario d36ee0a0a8
Add comments to explain magic numbers 2025-09-22 20:41:56 +01:00
Ed Addario 7ba6001ec8
Simplify candidates sorting 2025-09-22 20:11:54 +01:00
Ed Addario d79ade2e8e
Adjust for small vector size 2025-09-22 20:11:26 +01:00
Ed Addario f184450806
Fix minor logic flaw 2025-09-22 20:10:42 +01:00
Ed Addario 1fbc59f867
Replace slope with cross product 2025-09-22 20:10:10 +01:00
Ed Addario c855094dff
Exit loop if no better solution found 2025-09-22 20:09:11 +01:00
Ed Addario b748a1efa7
Fix typo 2025-09-21 22:03:54 +01:00
Ed Addario 896cdc2121
Refactor potential overflow 2025-09-21 22:03:36 +01:00
Ed Addario fecc472c61
Fix typos in variable names 2025-09-21 17:26:38 +01:00
Ed Addario e92db008bc
Refactor quantisation checks into its own function 2025-09-21 17:20:48 +01:00
Ed Addario 814f6b66be
Minor general refactoring 2025-09-21 16:45:09 +01:00
Ed Addario 0d5f18303e
Refactor lagrange_penalty() 2025-09-21 16:22:00 +01:00
Ed Addario 9a1656eb97
Refactor pareto optimise and convexify 2025-09-21 16:21:35 +01:00
Ed Addario 1a3e9ea4c8
Refactor estimate_error() 2025-09-21 16:21:00 +01:00
Ed Addario a7ee915e19
Refactor trimmed_sum() 2025-09-21 16:20:06 +01:00
Ed Addario b09662f86a
Refactor estimate_lambda() 2025-09-21 16:19:49 +01:00
Ed Addario 17be7615ce
Refactor candidate types build 2025-09-21 16:19:28 +01:00
Ed Addario 08146fd67f
Refactor side_data() and copy_or_broadcast() 2025-09-21 16:19:03 +01:00
Ed Addario 7386d4eadd
Refactor row sampling 2025-09-21 16:18:26 +01:00
Ed Addario b6c008fd8a
Refactor helper lambdas 2025-09-21 16:04:13 +01:00
Ed Addario b433fd9547
Refactor last budget pass 2025-09-21 13:43:09 +01:00
Ed Addario c466c53808
Refactor pareto pruning and convexification 2025-09-21 13:42:54 +01:00
Ed Addario 6b8cedf3bc
Refactor estimate_lambda() 2025-09-21 13:42:31 +01:00
Ed Addario bdefdb673c
Refactor copy_or_broadcast() 2025-09-21 13:42:07 +01:00
Ed Addario e8e2aed17a
Refactor row sampling 2025-09-21 13:41:44 +01:00
Ed Addario 9e74f83411
Replace --bpw-bias flag with --no-bias 2025-09-20 23:06:37 +01:00
Ed Addario ab02bb1f3e
Merge branch 'master' into quantize 2025-09-20 21:41:25 +01:00
Ed Addario a36946997e
Replace fast_bias() for per slice version and remove precise_bias() 2025-09-20 21:36:54 +01:00
Ed Addario 14fae69a7b
General refactoring 2025-09-20 21:31:31 +01:00
Georgi Gerganov e58174cecb
llama : bump max seq limit from 64 to 256 (#15916)
ggml-ci
2025-09-18 12:47:56 +03:00
Xuan-Son Nguyen 8f8f2274ee
convert : add Llama4ForCausalLM (#16042)
* convert : add Llama4ForCausalLM

* handle swa

* half working version

* fix use_kq_norm

* fix use_kq_norm
2025-09-17 19:18:21 +02:00
Jie Fu (傅杰) 745cbcf2fe
llama-quant : fix the verification of attention layers for encoder-decoder models (#16023)
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-09-17 09:30:55 +02:00
Shane A 85286f3548
model : add OLMo3 support (#16015)
* Add HF to gguf conversion logic for Olmo3

* Add Olmo3 implementation

* Update rope comment

* Fix indentation

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Apply suggestion from @CISC

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-17 09:01:58 +02:00
Aman Gupta 6d758839ff
Add LLaDA-7b-MoE diffusion model (#16003) 2025-09-16 10:38:28 +08:00
Ed Addario ad70fca5b2
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize 2025-09-15 07:42:37 +01:00
Ed Addario 9b857e3984
Merge branch 'ggml-org:master' into quantize 2025-09-14 23:35:43 +01:00
Ed Addario c709e1a335
Fix MoE tensor estimation 2025-09-14 22:38:27 +01:00
Sigbjørn Skjæret b8e09f08b9
model : add grok-2 support (#15539)
* add grok-2 support

* type fix

* type fix

* type fix

* "fix" vocab for invalid sequences

* fix expert tensor mapping and spaces in vocab

* add chat template

* fix norm tensor mapping

* rename layer_out_norm to ffn_post_norm

* ensure ffn_post_norm is mapped

* fix experts merging

* remove erroneous FFN_GATE entry

* concatenate split tensors and add more metadata

* process all expert layers and try cat instead of hstack

* add support for community BPE vocab

* fix expert feed forward length and ffn_down concat

* commit this too

* add ffn_up/gate/down, unsure if sequence is right

* add ffn_gate/down/up to tensor names

* correct residual moe (still not working)

* mess--

* fix embedding scale being applied twice

* add built in chat template

* change beta fast for grok if default value

* remove spm vocab in favor of community bpe vocab

* change attention temp length metadata type to integer

* update attention temp length metadata

* remove comment

* replace M_SQRT2 with std::sqrt(2)

* add yarn metadata, move defaults to hparams
2025-09-14 23:00:59 +02:00
Ed Addario 8503d59ee4
Increase IQ options 2025-09-13 11:49:18 +01:00
Ed Addario 2b516068e2
"Convexify" candidate list 2025-09-13 09:41:52 +01:00
Ed Addario 12e816b511
Replace greedy allocator with lagrangian relaxation 2025-09-13 09:24:23 +01:00
Ed Addario 7d85993f26
Minor refactoring 2025-09-13 08:44:41 +01:00
Ed Addario 4dff85fbe5
Improve precise_lambda() efficiency 2025-09-13 08:41:37 +01:00
Ed Addario bc8762f27f
Capture surrounding function name 2025-09-13 08:33:22 +01:00
Ed Addario 886536d80a
Increase error type precision 2025-09-13 08:27:23 +01:00
Haiyue Wang f4e664f838
context : remove redundant explicit casting to the same type (#15948)
The function 'output_reserve' return type is 'uint32_t', so need to add
explicit casting.
2025-09-12 18:16:32 +03:00
Diego Devesa 360d6533db
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type (#15797)
* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type

ggml-backend : add device id to device props

llama : only use iGPU devices if there are no GPU devices

llama : do not use multiple devices from different backends with the same device id
2025-09-11 22:47:38 +02:00
ddh0 df082f5630
nitpick : correct MB to MiB (#15934)
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
2025-09-11 19:12:34 +02:00