* tests: add end-to-end tests per model architecture
* fixup for rebase
* fix use-after-free in llama-model-loader.cpp
* fix CI
* fix WebGPU
* fix CI
* disable CI for macOS-latest-cmake-arm64
* use expert_weights_scale only if != 0.0f
* comments
* convert ok
* no deepstack
* less new tensors
* cgraph ok
* add mrope for text model
* faster patch merger
* add GGML_ROPE_TYPE_MRNORM
* add support for metal
* move glm4v do dedicated graph
* convert: add norm_embd
* clip: add debugging fn
* working correctly
* fix style
* use bicubic
* fix mrope metal
* improve cpu
* convert to neox ordering on conversion
* revert backend changes
* force stop if using old weight
* support moe variant
* fix conversion
* fix convert (2)
* Update tools/mtmd/clip-graph.h
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* process mrope_section on TextModel base class
* resolve conflict merge
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>