ubergarm
23bc779a6e
model : detect GigaChat3-10-A1.8B as deepseek lite ( #17420 )
...
* Detect GigaChat3-10-A1.8B as deepseek lite
Hardcodes checking number of layers to detect if lite version of deepseek.
* Add commnent identifying deepseek lite variants
deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
2025-11-21 14:51:38 +01:00
Bartowski
e1fcf8b09b
model : add AfmoeForCausalLM support ( #16477 )
...
* Add AFMOE model support
* Update to vocab
* Add model sizing
* Undo Rope change for ARCEE model
* Address review comments
* Update modeling code is_sliding -> use_rope, replace hard-coded logic
* Fix AFMOE tokenizer
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update AFMoE tokenizer class identification to be more unique
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-14 13:54:10 +01:00
Sigbjørn Skjæret
7bef684118
models : move build_inp_out_ids outside loop ( #17151 )
...
* move build_inp_out_ids outside loop
* realign
2025-11-10 22:55:30 +01:00
Sigbjørn Skjæret
9008027aa3
hparams : add n_embd_inp() to support extended embed ( #16928 )
...
* add n_embd_full to support extended embed
* don't change output
* rename to n_embd_inp
* restore n_embd where applicable
2025-11-07 19:27:58 +01:00
Li Pengzhan
9f052478c2
model : add openPangu-Embedded ( #16941 )
...
* Model: add openPangu-Embedded
* fixed according to reviewer's comments
* fixed the chat template check condition
* Apply suggestions from code review
change the chat-template check condition and some formatting issue
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* whitespace cleanup
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-11-05 10:28:58 +01:00
Sigbjørn Skjæret
b164259bba
chore : fix models indent after refactor ( #16992 )
2025-11-04 12:29:15 +01:00
Piotr Wilkin (ilintar)
bea04522ff
refactor : llama-model.cpp ( #16252 )
...
* Sqashed: llama-model.cpp refactoring
* Fix formatting of attn / ffn / ffn_moe calls
* Fix import regression / unify spacing in models.h
* totally DID NOT miss those!
* Add missing qwen3vl(moe) models
* Add missing new .cpp files to build
* Remove extra semicolons
* Editor checker
* Update src/models/models.h
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-31 23:40:23 +01:00