Commit Graph

211 Commits

Author SHA1 Message Date
Ed Addario 5b557ca958
Minor refactoring 2025-11-29 10:30:20 +00:00
Piotr Wilkin (ilintar) ff55414c42
model : Qwen3 Next (#16095)
* Qwen3 Next - cleaned up version

* Whitespaces and stuff

* Correct minor errors

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Misc. fixes.

* Clean up code, add missing hybrid qualifier

* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Whitespace

* Proper tensors for cb calls

* Use llama-graph.h vertical alignment

* BROKEN: chunking

* Set new tensors as inputs.

* Proper chunk logic

* It's the circle of life...

* More shenanigans for n_seq > 1

* Nail in the coffin?

* Fix Windows build

* Eh, one fails on Windows, the other fails on Mac... just use general capture.

* quant : cleanup

* model : cleanup

* qwen3 : cleanup

* cont : cleanup

* cont : cleanup

* ggml : revert change

* qwen3 : cleanup

* cont : cleanup

* Readd cmath

* qwen3 : fix typo

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Usual suspects

* fix my bad suggestion

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 12:02:56 +01:00
Ed Addario 6616008420
Use more descriptive option naming 2025-11-24 18:26:45 +00:00
Ed Addario 1c9993e131
Add --disable-tensor-importance option 2025-11-23 17:51:04 +00:00
Ed Addario 9ec3e6e262
Remove processing statistics_data 2025-11-23 17:49:53 +00:00
Ed Addario a0ba913613
Fix lambda capture bug in Windows and initialise candidate_types struct 2025-11-19 11:19:44 +00:00
Ed Addario ac8cfbdd12
Improved is_important() logic 2025-11-17 18:03:09 +00:00
Ed Addario b02b1b2304
Merge branch 'master' into quantize 2025-10-31 23:20:17 +00:00
Ed Addario c59bb6d49d
Add Euclidean-Cosine score to identify important tensors 2025-10-30 22:11:40 +00:00
Ed Addario 6e32244a06
Read statistics from imatrix 2025-10-30 21:53:07 +00:00
Jan Boon d7395115ba
llama : use std::abs instead of abs (#16853) 2025-10-30 08:30:58 +02:00
Ed Addario f8863b9a80
Minor refactoring 2025-10-28 15:22:32 +00:00
Ed Addario 5303212324
Simplify tensor selection 2025-10-26 17:40:52 +00:00
Ed Addario d6ccd5649a
Finetune heuristics 2025-10-25 12:09:20 +01:00
Ed Addario 04561d5782
Update epsilon specifier 2025-10-21 12:53:26 +01:00
Ed Addario 27bf25e93c
Fix lambda capture 2025-10-20 22:04:35 +01:00
Ed Addario 543b5a99db
Fix lambda capture 2025-10-20 21:57:03 +01:00
Ed Addario fa1df81d49
Finetune heuristics 2025-10-20 20:52:23 +01:00
Ed Addario 41a0069613
Merge branch 'master' into quantize 2025-10-16 22:20:04 +01:00
Ed Addario a5103933bb
Minor refactoring 2025-10-16 15:11:48 +01:00
Ed Addario 0b3e930d52
Add option to override bpw state file name 2025-10-16 11:41:26 +01:00
Ed Addario a6853ea2ae
Add tensor type and depth heuristics 2025-10-16 11:20:24 +01:00
Xuan-Son Nguyen 3e3cb19f64
llama-quant: add support for mmproj (#16592)
* llama-quant: add support for mmproj

* Update src/llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* check prefix instead

* small fix

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-10-15 14:48:08 +02:00
Ed Addario b7911f1431
Minor refactoring 2025-10-13 17:46:45 +01:00
Ed Addario cd734b89ce
Update quant types 2025-10-13 15:15:23 +01:00
Ed Addario b1b58e67df
Refactor signal handlers 2025-10-13 14:54:32 +01:00
Ed Addario ca282302b5
Add --keep-bpw-state option 2025-10-12 18:23:23 +01:00
Ed Addario b6094a97bf
Add quant types 2025-10-12 16:30:35 +01:00
Ed Addario 12e0524f3a
Reduce compute time by parallelising tensor processing - courtesy of https://github.com/ddh0 2025-10-12 15:12:15 +01:00
Ed Addario 5b0d3f6d5a
Automatically determine if bias error is significant 2025-10-11 10:04:48 +01:00
Ed Addario c93131cef6
Remove --no-bias option 2025-10-10 13:26:51 +01:00
Ed Addario 3a3d807fc3
Remove bias mode computation 2025-10-10 13:10:42 +01:00
Ed Addario c11184a3c1
Generate model ID hash 2025-10-09 11:58:01 +01:00
Ed Addario 044fa783c7
Fix trimming logic 2025-10-06 21:40:37 +01:00
Ed Addario 84ada44894
Uninstall signal handler and cleanup 2025-10-05 20:20:56 +01:00
Ed Addario 46706cec28
Persist progress 2025-10-05 20:20:28 +01:00
Ed Addario 74c62ed4e6
Add delete_bpw_state() 2025-10-05 20:19:03 +01:00
Ed Addario 02c3073b81
Add load_bpw_state() 2025-10-05 20:18:36 +01:00
Ed Addario e48ca32f19
Add save_bpw_state() 2025-10-05 20:17:27 +01:00
Ed Addario 533cda3076
Add signal handler 2025-10-05 20:16:33 +01:00
Ed Addario 560e8c9d70
Relax lambda clamping 2025-10-05 14:41:42 +01:00
Ed Addario f5d8811ddd
Prioritise important tensors 2025-10-01 19:04:43 +01:00
Ed Addario b3b8a111a5
Compute rows based on tensor shape and slice count 2025-09-28 18:45:25 +01:00
Ed Addario e49e241d37
Calculate bpw over all tensors 2025-09-27 17:28:39 +01:00
Ed Addario 3d75b14c0f
Simplify dequantisation 2025-09-27 17:27:58 +01:00
Ed Addario 8a2c71f471
Check for direction reversal 2025-09-27 17:27:29 +01:00
Ed Addario 87cba65908
Tighten worker allocator 2025-09-27 17:26:30 +01:00
Ed Addario d16945730e
Refactor outlier trimming 2025-09-27 17:25:29 +01:00
Ed Addario dd4f4bd0b8
Reduce bpw range 2025-09-27 17:23:48 +01:00
Ed Addario dbdd179a92
Combine quant types 2025-09-25 19:50:20 +01:00
Ed Addario a74b410f5f
Move is_iq() into a lambda and remove unused variables 2025-09-25 19:49:47 +01:00
Ed Addario 8eedcf74bc
Increase scale multiplier 2025-09-22 20:42:37 +01:00
Ed Addario d36ee0a0a8
Add comments to explain magic numbers 2025-09-22 20:41:56 +01:00
Ed Addario 7ba6001ec8
Simplify candidates sorting 2025-09-22 20:11:54 +01:00
Ed Addario d79ade2e8e
Adjust for small vector size 2025-09-22 20:11:26 +01:00
Ed Addario f184450806
Fix minor logic flaw 2025-09-22 20:10:42 +01:00
Ed Addario 1fbc59f867
Replace slope with cross product 2025-09-22 20:10:10 +01:00
Ed Addario c855094dff
Exit loop if no better solution found 2025-09-22 20:09:11 +01:00
Ed Addario b748a1efa7
Fix typo 2025-09-21 22:03:54 +01:00
Ed Addario 896cdc2121
Refactor potential overflow 2025-09-21 22:03:36 +01:00
Ed Addario fecc472c61
Fix typos in variable names 2025-09-21 17:26:38 +01:00
Ed Addario e92db008bc
Refactor quantisation checks into its own function 2025-09-21 17:20:48 +01:00
Ed Addario 814f6b66be
Minor general refactoring 2025-09-21 16:45:09 +01:00
Ed Addario 0d5f18303e
Refactor lagrange_penalty() 2025-09-21 16:22:00 +01:00
Ed Addario 9a1656eb97
Refactor pareto optimise and convexify 2025-09-21 16:21:35 +01:00
Ed Addario 1a3e9ea4c8
Refactor estimate_error() 2025-09-21 16:21:00 +01:00
Ed Addario a7ee915e19
Refactor trimmed_sum() 2025-09-21 16:20:06 +01:00
Ed Addario b09662f86a
Refactor estimate_lambda() 2025-09-21 16:19:49 +01:00
Ed Addario 17be7615ce
Refactor candidate types build 2025-09-21 16:19:28 +01:00
Ed Addario 08146fd67f
Refactor side_data() and copy_or_broadcast() 2025-09-21 16:19:03 +01:00
Ed Addario 7386d4eadd
Refactor row sampling 2025-09-21 16:18:26 +01:00
Ed Addario b6c008fd8a
Refactor helper lambdas 2025-09-21 16:04:13 +01:00
Ed Addario b433fd9547
Refactor last budget pass 2025-09-21 13:43:09 +01:00
Ed Addario c466c53808
Refactor pareto pruning and convexification 2025-09-21 13:42:54 +01:00
Ed Addario 6b8cedf3bc
Refactor estimate_lambda() 2025-09-21 13:42:31 +01:00
Ed Addario bdefdb673c
Refactor copy_or_broadcast() 2025-09-21 13:42:07 +01:00
Ed Addario e8e2aed17a
Refactor row sampling 2025-09-21 13:41:44 +01:00
Ed Addario 9e74f83411
Replace --bpw-bias flag with --no-bias 2025-09-20 23:06:37 +01:00
Ed Addario ab02bb1f3e
Merge branch 'master' into quantize 2025-09-20 21:41:25 +01:00
Ed Addario a36946997e
Replace fast_bias() for per slice version and remove precise_bias() 2025-09-20 21:36:54 +01:00
Ed Addario 14fae69a7b
General refactoring 2025-09-20 21:31:31 +01:00
Jie Fu (傅杰) 745cbcf2fe
llama-quant : fix the verification of attention layers for encoder-decoder models (#16023)
Signed-off-by: Jie Fu <jiefu@tencent.com>
2025-09-17 09:30:55 +02:00
Ed Addario ad70fca5b2
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into quantize 2025-09-15 07:42:37 +01:00
Ed Addario 9b857e3984
Merge branch 'ggml-org:master' into quantize 2025-09-14 23:35:43 +01:00
Ed Addario c709e1a335
Fix MoE tensor estimation 2025-09-14 22:38:27 +01:00
Ed Addario 8503d59ee4
Increase IQ options 2025-09-13 11:49:18 +01:00
Ed Addario 2b516068e2
"Convexify" candidate list 2025-09-13 09:41:52 +01:00
Ed Addario 12e816b511
Replace greedy allocator with lagrangian relaxation 2025-09-13 09:24:23 +01:00
Ed Addario 7d85993f26
Minor refactoring 2025-09-13 08:44:41 +01:00
Ed Addario 4dff85fbe5
Improve precise_lambda() efficiency 2025-09-13 08:41:37 +01:00
Ed Addario bc8762f27f
Capture surrounding function name 2025-09-13 08:33:22 +01:00
Ed Addario 886536d80a
Increase error type precision 2025-09-13 08:27:23 +01:00
ddh0 df082f5630
nitpick : correct MB to MiB (#15934)
MB was incorrectly used for 1024 x 1024 bytes instead of MiB
2025-09-11 19:12:34 +02:00
Ed Addario 04c07b3272
Add better control over MSE and directional bias computation 2025-09-10 18:00:56 +01:00
Ed Addario eab8708244
Minor factoring for efficiency and correctness 2025-08-30 10:14:46 +01:00
Ed Addario 556f6b04fe
Add --precise-lambda option 2025-08-28 16:08:08 +01:00
Ed Addario 66aff8fa1e
Add precise_lambda() 2025-08-28 16:06:42 +01:00
Ed Addario 8df1d00ae4
Add directional scaling 2025-08-28 16:04:28 +01:00
Ed Addario 04946114c9
Refactor epsilon into a function-wide variable 2025-08-28 16:01:03 +01:00
Ed Addario 4286690019
Minor comment update 2025-08-26 21:39:40 +01:00