* It's Qwen3 Next, the lean mean token generation machine!
* Apply patches from thread
* Remove recurrent version, only keep chunked and autoregressive
* Remove unnecessary conts and asserts
* Remove more extra conts and asserts
* Cleanup masking
* convert ok
* no deepstack
* less new tensors
* cgraph ok
* add mrope for text model
* faster patch merger
* add GGML_ROPE_TYPE_MRNORM
* add support for metal
* move glm4v do dedicated graph
* convert: add norm_embd
* clip: add debugging fn
* working correctly
* fix style
* use bicubic
* fix mrope metal
* improve cpu
* convert to neox ordering on conversion
* revert backend changes
* force stop if using old weight
* support moe variant
* fix conversion
* fix convert (2)
* Update tools/mtmd/clip-graph.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* process mrope_section on TextModel base class
* resolve conflict merge
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* llama : add support for NVIDIA Nemotron Nano 3
This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling
the conversion and running of this model.
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Qwen3 Next - cleaned up version
* Whitespaces and stuff
* Correct minor errors
* Update src/llama-model.cpp
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Misc. fixes.
* Clean up code, add missing hybrid qualifier
* Did someone transpose the SOLVE_TRI result matrix? Perhaps...
* Whitespace
* Proper tensors for cb calls
* Use llama-graph.h vertical alignment
* BROKEN: chunking
* Set new tensors as inputs.
* Proper chunk logic
* It's the circle of life...
* More shenanigans for n_seq > 1
* Nail in the coffin?
* Fix Windows build
* Eh, one fails on Windows, the other fails on Mac... just use general capture.
* quant : cleanup
* model : cleanup
* qwen3 : cleanup
* cont : cleanup
* cont : cleanup
* ggml : revert change
* qwen3 : cleanup
* cont : cleanup
* Readd cmath
* qwen3 : fix typo
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Usual suspects
* fix my bad suggestion
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Detect GigaChat3-10-A1.8B as deepseek lite
Hardcodes checking number of layers to detect if lite version of deepseek.
* Add commnent identifying deepseek lite variants
deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B