Xuan-Son Nguyen
|
59db9a357d
|
llama: dynamic head_dim and n_rot for SWA (#20301)
* llama: dynamic head_dim and n_rot for SWA
* also add gguf_writer wrappers
* fix build
* build_rope_shift arg reorder
|
2026-03-09 22:22:39 +01:00 |
Sigbjørn Skjæret
|
35bee031e1
|
graph : remove redundant scale_w parameter (#20235)
|
2026-03-08 18:58:28 +01:00 |
Sigbjørn Skjæret
|
eadc4184ca
|
llama : refactor rope_freq_base/scale_swa conversion and init (#18553)
* refactor rope_freq_base/scale_swa conversion and init
* safe defaults for unknowns
* update relevant models
* grammar
* add get_rope_freq_scale to modern-bert
* const
* const
* log swa info
|
2026-01-05 09:14:04 +01:00 |
Bartowski
|
e1fcf8b09b
|
model : add AfmoeForCausalLM support (#16477)
* Add AFMOE model support
* Update to vocab
* Add model sizing
* Undo Rope change for ARCEE model
* Address review comments
* Update modeling code is_sliding -> use_rope, replace hard-coded logic
* Fix AFMOE tokenizer
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update AFMoE tokenizer class identification to be more unique
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
2025-11-14 13:54:10 +01:00 |