Commit Graph

173 Commits

Author SHA1 Message Date
Ed Addario 960ef96141
Prepare for future optimization algorithms 2026-01-01 13:44:59 +00:00
Ed Addario 91846ee79b
Change checkpoint file magic 2025-12-29 13:02:06 +00:00
Ed Addario b6d718a4a6
Add code comments 2025-12-25 15:47:44 +00:00
Ed Addario 5f7bba7828
Improve state checkpoint filename 2025-12-25 15:47:18 +00:00
Ed Addario dfa79a9484
Merge branch 'master' into quantize 2025-12-16 13:57:54 +01:00
Johannes Gäßler b1f3a6e5db
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653)
* llama: automatically fit args to free memory

llama-fit-params tool

* fix CI

* hints for bug reports, ensure no reallocation

* fix segfault with Vulkan

* add llama-fit-params to CI

* fix CI

* fix CI

* fix CI

* minor adjustments

* fix assignment of 1 dense layer

* fix logger not being reset on model load failure

* remove --n-gpu-layer hint on model load failure

* fix llama-fit-params verbosity

* fix edge case

* fix typo [no ci]
2025-12-15 09:24:59 +01:00
Ed Addario e3d9b340ca
Merge branch 'master' into quantize 2025-12-06 15:07:36 +01:00
Daniel Bevenius 444f00b0ec
llama : remove quantization sanity check (#17788)
* llama : remove quantization sanity check

This commit removes the quantization sanity check for attention layers.

The motivation for this is that there are model that are hybrid models
that have recurrent layers, experts layers, and attention layers.  For
these models the current check fails as the experts layers are not
taking into account. After consideration, it was decided that this check
is not strictly necessary, and can be removed to allow for more flexible
model architectures.

* llama : remove unused pruned_attention_w and is_clip_model vars
2025-12-06 12:26:20 +01:00
Georgi Gerganov a67ef0f47f
llama : fix sanity checks during quantization (#17721) 2025-12-04 10:33:42 +02:00
Ed Addario 3f7842c645
Merge branch 'master' into quantize 2025-11-30 13:01:54 +00:00
Ed Addario 37cf51ebd0
Process bpw targets up to B/F16 2025-11-30 00:29:35 +00:00
Ed Addario 229109f329
Increase importance boost for final pass 2025-11-29 10:31:39 +00:00
Ed Addario 5b557ca958
Minor refactoring 2025-11-29 10:30:20 +00:00
Piotr Wilkin (ilintar) ff55414c42
model : Qwen3 Next (#16095)
* Qwen3 Next - cleaned up version

* Whitespaces and stuff

* Correct minor errors

* Update src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Misc. fixes.

* Clean up code, add missing hybrid qualifier

* Did someone transpose the SOLVE_TRI result matrix? Perhaps...

* Whitespace

* Proper tensors for cb calls

* Use llama-graph.h vertical alignment

* BROKEN: chunking

* Set new tensors as inputs.

* Proper chunk logic

* It's the circle of life...

* More shenanigans for n_seq > 1

* Nail in the coffin?

* Fix Windows build

* Eh, one fails on Windows, the other fails on Mac... just use general capture.

* quant : cleanup

* model : cleanup

* qwen3 : cleanup

* cont : cleanup

* cont : cleanup

* ggml : revert change

* qwen3 : cleanup

* cont : cleanup

* Readd cmath

* qwen3 : fix typo

* Update convert_hf_to_gguf.py

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* Usual suspects

* fix my bad suggestion

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-11-28 12:02:56 +01:00
Ed Addario 6616008420
Use more descriptive option naming 2025-11-24 18:26:45 +00:00
Ed Addario 1c9993e131
Add --disable-tensor-importance option 2025-11-23 17:51:04 +00:00
Ed Addario 9ec3e6e262
Remove processing statistics_data 2025-11-23 17:49:53 +00:00
Ed Addario a0ba913613
Fix lambda capture bug in Windows and initialise candidate_types struct 2025-11-19 11:19:44 +00:00
Ed Addario ac8cfbdd12
Improved is_important() logic 2025-11-17 18:03:09 +00:00
Ed Addario b02b1b2304
Merge branch 'master' into quantize 2025-10-31 23:20:17 +00:00
Ed Addario c59bb6d49d
Add Euclidean-Cosine score to identify important tensors 2025-10-30 22:11:40 +00:00
Ed Addario 6e32244a06
Read statistics from imatrix 2025-10-30 21:53:07 +00:00
Jan Boon d7395115ba
llama : use std::abs instead of abs (#16853) 2025-10-30 08:30:58 +02:00
Ed Addario f8863b9a80
Minor refactoring 2025-10-28 15:22:32 +00:00
Ed Addario 5303212324
Simplify tensor selection 2025-10-26 17:40:52 +00:00
Ed Addario d6ccd5649a
Finetune heuristics 2025-10-25 12:09:20 +01:00
Ed Addario 04561d5782
Update epsilon specifier 2025-10-21 12:53:26 +01:00
Ed Addario 27bf25e93c
Fix lambda capture 2025-10-20 22:04:35 +01:00
Ed Addario 543b5a99db
Fix lambda capture 2025-10-20 21:57:03 +01:00
Ed Addario fa1df81d49
Finetune heuristics 2025-10-20 20:52:23 +01:00
Ed Addario 41a0069613
Merge branch 'master' into quantize 2025-10-16 22:20:04 +01:00
Ed Addario a5103933bb
Minor refactoring 2025-10-16 15:11:48 +01:00
Ed Addario 0b3e930d52
Add option to override bpw state file name 2025-10-16 11:41:26 +01:00
Ed Addario a6853ea2ae
Add tensor type and depth heuristics 2025-10-16 11:20:24 +01:00
Xuan-Son Nguyen 3e3cb19f64
llama-quant: add support for mmproj (#16592)
* llama-quant: add support for mmproj

* Update src/llama.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* check prefix instead

* small fix

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-10-15 14:48:08 +02:00
Ed Addario b7911f1431
Minor refactoring 2025-10-13 17:46:45 +01:00
Ed Addario cd734b89ce
Update quant types 2025-10-13 15:15:23 +01:00
Ed Addario b1b58e67df
Refactor signal handlers 2025-10-13 14:54:32 +01:00
Ed Addario ca282302b5
Add --keep-bpw-state option 2025-10-12 18:23:23 +01:00
Ed Addario b6094a97bf
Add quant types 2025-10-12 16:30:35 +01:00
Ed Addario 12e0524f3a
Reduce compute time by parallelising tensor processing - courtesy of https://github.com/ddh0 2025-10-12 15:12:15 +01:00
Ed Addario 5b0d3f6d5a
Automatically determine if bias error is significant 2025-10-11 10:04:48 +01:00
Ed Addario c93131cef6
Remove --no-bias option 2025-10-10 13:26:51 +01:00
Ed Addario 3a3d807fc3
Remove bias mode computation 2025-10-10 13:10:42 +01:00
Ed Addario c11184a3c1
Generate model ID hash 2025-10-09 11:58:01 +01:00
Ed Addario 044fa783c7
Fix trimming logic 2025-10-06 21:40:37 +01:00
Ed Addario 84ada44894
Uninstall signal handler and cleanup 2025-10-05 20:20:56 +01:00
Ed Addario 46706cec28
Persist progress 2025-10-05 20:20:28 +01:00
Ed Addario 74c62ed4e6
Add delete_bpw_state() 2025-10-05 20:19:03 +01:00
Ed Addario 02c3073b81
Add load_bpw_state() 2025-10-05 20:18:36 +01:00