Commit Graph

7 Commits

Author SHA1 Message Date
Georgi Gerganov d9c6ce46f7
kv-cache : support V-less cache (#19067)
* kv-cache : support V-less cache

* cuda : better check for V_is_K_view

* cuda : improve V_is_K_view check

* graph : add comments

* hparams : refactor
2026-01-25 15:48:56 +02:00
Georgi Gerganov a5eaa1d6a3
mla : make the V tensor a view of K (#18986)
* mla : pass V as a view of K to the FA op

* cuda : adjust mla logic to new layout

* kv-cache : fix rope shift

* tests : remove comment

* cuda : fix reusable_cutoff

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>

---------

Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-01-22 22:09:01 +02:00
tt ced765be44
model: support youtu-vl model (#18479)
* Support Youtu-VL Model

* merge code

* fix bug

* revert qwen2 code & support rsplit in minja.hpp

* update warm info

* fix annotation

* u

* revert minja.hpp

* fix

* Do not write routed_scaling_factor to gguf when routed_scaling_factor is None

* fix expert_weights_scale

* LGTM after whitespace fixes

* fix

* fix

* fix

* layers to layer_index

* enum fix

---------

Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-01 19:25:54 +01:00
Georgi Gerganov 7bed317f53
models : fix the attn_factor for mistral3 graphs + improve consistency (#17945)
* models : fix the attn_factor for mistral3 graphs

* cont : rework attn_factor correction logic

* cont : make deepseek2 consistent

* cont : add TODO

* cont : special-case DSv2

* cont : revert Mistral 3 Large changes

* cont : fix DS2 to use the original attn_factor

* cont : minor comments
2025-12-12 17:12:40 +02:00
Xuan-Son Nguyen 4d3726278b
model: add llama 4 scaling for mistral-large (deepseek arch) (#17744) 2025-12-07 22:29:54 +01:00
ubergarm 23bc779a6e
model : detect GigaChat3-10-A1.8B as deepseek lite (#17420)
* Detect GigaChat3-10-A1.8B as deepseek lite

Hardcodes checking number of layers to detect if lite version of deepseek.

* Add commnent identifying deepseek lite variants

deepseek lite variants include DeepSeek-V2-Lite, GigaChat3-10B-A1.8B
2025-11-21 14:51:38 +01:00
Piotr Wilkin (ilintar) bea04522ff
refactor : llama-model.cpp (#16252)
* Sqashed: llama-model.cpp refactoring

* Fix formatting of attn / ffn / ffn_moe calls

* Fix import regression / unify spacing in models.h

* totally DID NOT miss those!

* Add missing qwen3vl(moe) models

* Add missing new .cpp files to build

* Remove extra semicolons

* Editor checker

* Update src/models/models.h

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

---------

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-10-31 23:40:23 +01:00