llama.cpp/gguf-py/gguf
Sigbjørn Skjæret 84bf3c6778
model : add BailingMoeV2 support (#16063)
* add BailingMoeV2 support

* update llm types

* undo

* undo

* update llm types

* add model collection link

* update

* almost working

* correct group selection and rename n_group_exp

* avoid large top_k and use argmax instead for now

if we had something like argmax2 that would be equivalent, but this works fine until then

* poke

* skip group selection when there are no tokens

* fix 1T conversion

* hopefully fixed expert group selection

third time's the charm?

* make expert group selection generally available

The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture.

* allow n_expert_groups to be 1 (Kimi K2)

* address review suggestions
2025-10-20 21:38:20 +02:00
..
scripts gguf-py : add support for endian conversion of BF16 data (#16594) 2025-10-15 22:43:08 +02:00
__init__.py
constants.py model : add BailingMoeV2 support (#16063) 2025-10-20 21:38:20 +02:00
gguf.py
gguf_reader.py
gguf_writer.py model : add BailingMoeV2 support (#16063) 2025-10-20 21:38:20 +02:00
lazy.py
metadata.py
py.typed
quants.py
tensor_mapping.py model : add BailingMoeV2 support (#16063) 2025-10-20 21:38:20 +02:00
utility.py convert : improve Mistral models integration (#14737) 2025-08-11 10:07:49 +02:00
vocab.py