Georgi Gerganov
39b6f5a760
models : avoid Q and K repeats when using fused GDA
2026-03-10 12:20:14 +02:00
Xuan-Son Nguyen
59db9a357d
llama: dynamic head_dim and n_rot for SWA ( #20301 )
...
* llama: dynamic head_dim and n_rot for SWA
* also add gguf_writer wrappers
* fix build
* build_rope_shift arg reorder
2026-03-09 22:22:39 +01:00
Sigbjørn Skjæret
35bee031e1
graph : remove redundant scale_w parameter ( #20235 )
2026-03-08 18:58:28 +01:00
Aman Gupta
c5a778891b
ggml: add GATED_DELTA_NET op ( #19504 )
...
* ggml: add GATED_DELTA_NET op
* remove the transpose
* add KDA
* add qwen35 dense
* llama : check for fused gated delta net backend support
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-03-07 15:41:10 +08:00
Aman Gupta
b68d75165a
llama: Add option to merge gate and exp weights ( #19139 )
...
* llama: Add option to merge gate and exp weights
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update convert_hf_to_gguf.py
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* update constants.py
* add gate_up for the all MoE models
* convert: simplify merge tensor condition
* update constants.py
* reduce number of models, add create_tensor_gate_up helper
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-02-26 21:01:08 +08:00
Georgi Gerganov
244641955f
models : fix graph splits ( #19866 )
2026-02-25 00:01:13 +02:00
Georgi Gerganov
da348c9dfb
models : fix qwen3.5 beta/gate shapes ( #19730 )
...
* models : fix qwen3.5 beta/gate shapes
* cont : avoid extra reshapes
2026-02-19 15:19:53 +02:00
Georgi Gerganov
27326bfce1
models : dedup qwen35 graphs ( #19660 )
...
* models : dedup qwen35 graphs
* cont : add missing sigmoid
2026-02-19 08:17:49 +02:00
Georgi Gerganov
cc45f2ada6
models : deduplicate delta-net graphs for Qwen family ( #19597 )
...
* models : add llm_build_delta_net_base
* cont : keep qwen35 and qwen35moe graphs intact
* cont : add comments
2026-02-16 14:35:04 +02:00
JJJYmmm
fc0fe40049
models : support qwen3.5 series ( #19468 )
...
* support qwen3.5 series
* remove deepstack for now, and some code clean
* code clean
* add FULL_ATTENTION_INTERVAL metadata
* code clean
* reorder v heads for linear attention to avoid expensive interleaved repeat
2026-02-10 18:00:26 +02:00