Ed Addario
a74b410f5f
Move is_iq() into a lambda and remove unused variables
2025-09-25 19:49:47 +01:00
Ed Addario
8eedcf74bc
Increase scale multiplier
2025-09-22 20:42:37 +01:00
Ed Addario
d36ee0a0a8
Add comments to explain magic numbers
2025-09-22 20:41:56 +01:00
Ed Addario
7ba6001ec8
Simplify candidates sorting
2025-09-22 20:11:54 +01:00
Ed Addario
d79ade2e8e
Adjust for small vector size
2025-09-22 20:11:26 +01:00
Ed Addario
f184450806
Fix minor logic flaw
2025-09-22 20:10:42 +01:00
Ed Addario
1fbc59f867
Replace slope with cross product
2025-09-22 20:10:10 +01:00
Ed Addario
c855094dff
Exit loop if no better solution found
2025-09-22 20:09:11 +01:00
Ed Addario
b748a1efa7
Fix typo
2025-09-21 22:03:54 +01:00
Ed Addario
896cdc2121
Refactor potential overflow
2025-09-21 22:03:36 +01:00
Ed Addario
fecc472c61
Fix typos in variable names
2025-09-21 17:26:38 +01:00
Ed Addario
e92db008bc
Refactor quantisation checks into its own function
2025-09-21 17:20:48 +01:00
Ed Addario
814f6b66be
Minor general refactoring
2025-09-21 16:45:09 +01:00
Ed Addario
0d5f18303e
Refactor lagrange_penalty()
2025-09-21 16:22:00 +01:00
Ed Addario
9a1656eb97
Refactor pareto optimise and convexify
2025-09-21 16:21:35 +01:00
Ed Addario
1a3e9ea4c8
Refactor estimate_error()
2025-09-21 16:21:00 +01:00
Ed Addario
a7ee915e19
Refactor trimmed_sum()
2025-09-21 16:20:06 +01:00
Ed Addario
b09662f86a
Refactor estimate_lambda()
2025-09-21 16:19:49 +01:00
Ed Addario
17be7615ce
Refactor candidate types build
2025-09-21 16:19:28 +01:00
Ed Addario
08146fd67f
Refactor side_data() and copy_or_broadcast()
2025-09-21 16:19:03 +01:00
Ed Addario
7386d4eadd
Refactor row sampling
2025-09-21 16:18:26 +01:00
Ed Addario
b6c008fd8a
Refactor helper lambdas
2025-09-21 16:04:13 +01:00
Ed Addario
b433fd9547
Refactor last budget pass
2025-09-21 13:43:09 +01:00
Ed Addario
c466c53808
Refactor pareto pruning and convexification
2025-09-21 13:42:54 +01:00
Ed Addario
6b8cedf3bc
Refactor estimate_lambda()
2025-09-21 13:42:31 +01:00
Ed Addario
bdefdb673c
Refactor copy_or_broadcast()
2025-09-21 13:42:07 +01:00
Ed Addario
e8e2aed17a
Refactor row sampling
2025-09-21 13:41:44 +01:00
Ed Addario
9e74f83411
Replace --bpw-bias flag with --no-bias
2025-09-20 23:06:37 +01:00
Ed Addario
ab02bb1f3e
Merge branch 'master' into quantize
2025-09-20 21:41:25 +01:00
Ed Addario
a36946997e
Replace fast_bias() for per slice version and remove precise_bias()
2025-09-20 21:36:54 +01:00
Ed Addario
14fae69a7b
General refactoring
2025-09-20 21:31:31 +01:00
Georgi Gerganov
e58174cecb
llama : bump max seq limit from 64 to 256 ( #15916 )
...
ggml-ci
2025-09-18 12:47:56 +03:00
Xuan-Son Nguyen
8f8f2274ee
convert : add Llama4ForCausalLM ( #16042 )
...
* convert : add Llama4ForCausalLM
* handle swa
* half working version
* fix use_kq_norm
* fix use_kq_norm
2025-09-17 19:18:21 +02:00
Jie Fu (傅杰)
745cbcf2fe
llama-quant : fix the verification of attention layers for encoder-decoder models ( #16023 )
...
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-09-17 09:30:55 +02:00
Shane A
85286f3548
model : add OLMo3 support ( #16015 )
...
* Add HF to gguf conversion logic for Olmo3
* Add Olmo3 implementation
* Update rope comment
* Fix indentation
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Apply suggestion from @CISC
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-17 09:01:58 +02:00
Aman Gupta
6d758839ff
Add LLaDA-7b-MoE diffusion model ( #16003 )
2025-09-16 10:38:28 +08:00
Ed Addario
ad70fca5b2
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize
2025-09-15 07:42:37 +01:00
Ed Addario
9b857e3984
Merge branch 'ggml-org:master' into quantize
2025-09-14 23:35:43 +01:00
Ed Addario
c709e1a335
Fix MoE tensor estimation
2025-09-14 22:38:27 +01:00
Sigbjørn Skjæret
b8e09f08b9
model : add grok-2 support ( #15539 )
...
* add grok-2 support
* type fix
* type fix
* type fix
* "fix" vocab for invalid sequences
* fix expert tensor mapping and spaces in vocab
* add chat template
* fix norm tensor mapping
* rename layer_out_norm to ffn_post_norm
* ensure ffn_post_norm is mapped
* fix experts merging
* remove erroneous FFN_GATE entry
* concatenate split tensors and add more metadata
* process all expert layers and try cat instead of hstack
* add support for community BPE vocab
* fix expert feed forward length and ffn_down concat
* commit this too
* add ffn_up/gate/down, unsure if sequence is right
* add ffn_gate/down/up to tensor names
* correct residual moe (still not working)
* mess--
* fix embedding scale being applied twice
* add built in chat template
* change beta fast for grok if default value
* remove spm vocab in favor of community bpe vocab
* change attention temp length metadata type to integer
* update attention temp length metadata
* remove comment
* replace M_SQRT2 with std::sqrt(2)
* add yarn metadata, move defaults to hparams
2025-09-14 23:00:59 +02:00
Ed Addario
8503d59ee4
Increase IQ options
2025-09-13 11:49:18 +01:00
Ed Addario
2b516068e2
"Convexify" candidate list
2025-09-13 09:41:52 +01:00
Ed Addario
12e816b511
Replace greedy allocator with lagrangian relaxation
2025-09-13 09:24:23 +01:00
Ed Addario
7d85993f26
Minor refactoring
2025-09-13 08:44:41 +01:00
Ed Addario
4dff85fbe5
Improve precise_lambda() efficiency
2025-09-13 08:41:37 +01:00
Ed Addario
bc8762f27f
Capture surrounding function name
2025-09-13 08:33:22 +01:00
Ed Addario
886536d80a
Increase error type precision
2025-09-13 08:27:23 +01:00
Haiyue Wang
f4e664f838
context : remove redundant explicit casting to the same type ( #15948 )
...
The function 'output_reserve' return type is 'uint32_t', so need to add
explicit casting.
2025-09-12 18:16:32 +03:00
Diego Devesa
360d6533db
ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type ( #15797 )
...
* ggml-backend : add GGML_BACKEND_DEVICE_TYPE_IGPU device type
ggml-backend : add device id to device props
llama : only use iGPU devices if there are no GPU devices
llama : do not use multiple devices from different backends with the same device id
2025-09-11 22:47:38 +02:00
ddh0
df082f5630
nitpick : correct MB to MiB ( #15934 )
...
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
2025-09-11 19:12:34 +02:00