llama.cpp

Commit Graph

Author	SHA1	Message	Date
Saba Fallah	33fabf0bd8	Merge branch 'master' into sf/deepseek-ocr-merge-test # Conflicts: # tools/mtmd/clip.cpp # tools/mtmd/mtmd-cli.cpp	2025-12-11 08:13:50 +01:00
Xuan-Son Nguyen	9e79b0116e	convert: allow using quantized Mistral weight (#17889 ) * convert: allow using quantized Mistral weight * data_torch.ndim * update dequant fn Co-authored-by: compilade <compilade@users.noreply.github.com> --------- Co-authored-by: compilade <compilade@users.noreply.github.com>	2025-12-10 10:26:22 +01:00
philip-essential	1d2a1ab73d	model : support Rnj-1 (#17811 ) * add support for rnj1 * refactor gemma3 to support rnj-1 * address review comments	2025-12-09 04:49:03 +01:00
bluebread	48c6cf2132	mtmd: convert model in FP16	2025-12-08 02:36:00 +00:00
Xuan-Son Nguyen	dbc15a7967	convert: support Mistral 3 Large MoE (#17730 ) * convert: support Mistral 3 Large MoE * filter out vision tensors, add missing keys * handle vocab * add temperature_length * fix mscale_all_dim * clean up * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-06 10:49:33 +01:00
bluebread	2d918b3e21	mtmd: make sam hparams configurable	2025-12-06 06:55:53 +00:00
bluebread	15f2ada0ed	mtmd: simplify get_rel_pos	2025-12-06 06:32:41 +00:00
Saba Fallah	1c88647ec6	fixed flake8 lint issues	2025-12-05 12:24:10 +01:00
Saba Fallah	5f2ee1aecf	Merge branch 'ggml-org:master' into sf/deepseek-ocr	2025-12-05 11:56:06 +01:00
bluebread	2dd9924076	Merge branch 'sf/deepseek-ocr-cleanup' of github.com:sfallah/llama.cpp into sf/deepseek-ocr-cleanup	2025-12-04 16:52:00 +00:00
bluebread	c89171cf4d	mtmd: fixed bad ocr check in Deepseek2 (LM)	2025-12-04 16:50:05 +00:00
Saba Fallah	386ba479a2	clean up	2025-12-04 15:05:58 +01:00
SmartestWashingMachine	3659aa28e9	convert: use existing local chat_template if mistral-format model has one. (#17749 ) * conversion: use existing local chat_template.jinja file if mistral-format model has one. * fix --mistral-format mistakenly assuming some <=v7 chat template names are file paths and reading them. * Update convert_hf_to_gguf.py - change from exists() to is_file() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-04 12:12:45 +01:00
Saba Fallah	66341666fb	Merge branch 'master' into sf/deepseek-ocr # Conflicts: # convert_hf_to_gguf.py # tools/mtmd/clip.h # tools/mtmd/mtmd.cpp	2025-12-02 21:02:13 +01:00
Xuan-Son Nguyen	2c453c6c77	convert: add error message for mistral3 quantized weight (#17686 )	2025-12-02 11:48:31 +01:00
Xuan-Son Nguyen	cd3c118908	model: support Ministral3 (#17644 ) * conversion script * support ministral 3 * maybe this is better? * add TODO for rope_yarn_log_mul * better ppl (tested on 14B-Instruct) * Add Ministral3 support to Mistral format * improve arch handling * add sizes * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits --------- Co-authored-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-01 12:26:52 +01:00
bluebread	55430945ef	Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr	2025-11-30 08:55:29 +00:00
Saba Fallah	ed3b7f1056	Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr # Conflicts: # convert_hf_to_gguf.py # src/llama-model.cpp # src/models/deepseek2.cpp	2025-11-30 08:29:09 +01:00
bluebread	a488b495f7	mtmd: SAM numerically works	2025-11-29 02:17:49 +00:00
Piotr Wilkin (ilintar)	ff55414c42	model : Qwen3 Next (#16095 ) * Qwen3 Next - cleaned up version * Whitespaces and stuff * Correct minor errors * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Misc. fixes. * Clean up code, add missing hybrid qualifier * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Whitespace * Proper tensors for cb calls * Use llama-graph.h vertical alignment * BROKEN: chunking * Set new tensors as inputs. * Proper chunk logic * It's the circle of life... * More shenanigans for n_seq > 1 * Nail in the coffin? * Fix Windows build * Eh, one fails on Windows, the other fails on Mac... just use general capture. * quant : cleanup * model : cleanup * qwen3 : cleanup * cont : cleanup * cont : cleanup * ggml : revert change * qwen3 : cleanup * cont : cleanup * Readd cmath * qwen3 : fix typo * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Usual suspects * fix my bad suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 12:02:56 +01:00
Aleksei Nikiforov	05872ac885	convert : fix big-endian conversion (#17431 ) * Fix convert_hf_to_gguf.py script on s390x Assume converted model data is originally little-endian. Byteswap data on s390x after reading it to put values in correct presentation for any transformation needed, like calculating weight tensors. Then byteswap data to little-endian before passing it to GGUFWriter while GGUFWriter will byteswap data back to big endian if big endian output is requested. byteswap(inplace=True) calls don't work with lazy tensor and array wrappers. Use byteswap with copying data to workaround this behaviour. * Make GGUFWriter accept tensors in native endianness instead of little-endian With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x * Fix byteswapping in convert_hf_to_gguf.py for remote models	2025-11-25 14:18:16 +01:00
Sigbjørn Skjæret	b61de2b2df	convert : allow quantizing lora again (#17453 )	2025-11-24 15:50:55 +01:00
william pan	4902eebe33	models : Added support for RND1 Diffusion Language Model (#17433 ) * Converted RND1 model to GGUF weights * RND1 llama.cpp support v1 * RND1 llama.cpp support v2 non causal bug * RND1 llama.cpp support v3 doccumentation * RND1 llama.cpp support v4 clean code * linting issues * RND1 pr fixes v1 * RND1 pr fixes v2 Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Diffusion documentation edits --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-24 14:16:56 +08:00
Sigbjørn Skjæret	07b0e7a5ac	convert : use self.block_count everywhere instead of reading hparams (#17359 )	2025-11-19 11:52:38 +01:00
Saba Fallah	1e08157134	clip-vit: model convert qkv_proj split	2025-11-17 21:19:51 +01:00
Saba Fallah	13dc6fb305	Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr	2025-11-17 11:25:16 +01:00
Saba Fallah	97e0907c5b	loading LM testing Vision model loading	2025-11-17 11:07:33 +01:00
bluebread	76305878d5	mtmd: successfully runs DeepSeek-OCR LM in llama-cli	2025-11-16 08:45:08 +00:00
Sigbjørn Skjæret	662192e1dc	convert : remove unnecessary chat template patching (#17289 )	2025-11-15 20:58:59 +01:00
bluebread	eab28ed318	mtmd: add DeepSeek-OCR LM support with standard attention	2025-11-15 17:28:18 +00:00
Sigbjørn Skjæret	9a8860cf5d	convert : use all parts in safetensors index (#17286 )	2025-11-15 14:12:39 +01:00
Sigbjørn Skjæret	9d3ef4809f	convert : set expert gating func in base class (#17279 )	2025-11-15 14:06:24 +01:00
bluebread	85c7cda8eb	mtmd: fix vision model processing	2025-11-15 04:20:01 +00:00
Bartowski	e1fcf8b09b	model : add AfmoeForCausalLM support (#16477 ) * Add AFMOE model support * Update to vocab * Add model sizing * Undo Rope change for ARCEE model * Address review comments * Update modeling code is_sliding -> use_rope, replace hard-coded logic * Fix AFMOE tokenizer * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update AFMoE tokenizer class identification to be more unique --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-14 13:54:10 +01:00
Saba Fallah	43a130b4d0	mtmd: llama.cpp DeepSeekOCR support init commit	2025-11-14 12:40:20 +01:00
levkropp	2fc392ce35	convert : register UMT5Model architecture for T5 conversion (#17160 ) Register UMT5Model as a supported architecture variant for T5 model conversion. This allows the conversion to work for models downloaded with AutoModel.	2025-11-11 09:38:30 +01:00
compilade	802cef44bf	convert : parse safetensors directly (#15667 ) * convert : parse safetensors directly * gguf-py : order safetensors tensors by name Applies to both local and remote safetensors custom parsing. This matches the behavior of the official safetensors implementation. * convert : rename from_safetensors_meta to from_local_tensor For consistency with from_remote_tensor * convert : fix no-lazy dtypes from direct safetensors	2025-11-09 09:49:40 -05:00
compilade	1c07c0c68c	convert : handle compressed-tensors quant method (#17069 ) * convert : handle compressed-tensors quant method * convert : handle int-quantized models * convert : handle naive-quantized models * gguf-py : __pos__ is also unary * convert : fix flake8 lint * convert : use F32 for dequant of pack-quantized tensors	2025-11-09 09:45:50 -05:00
Li Pengzhan	9f052478c2	model : add openPangu-Embedded (#16941 ) * Model: add openPangu-Embedded * fixed according to reviewer's comments * fixed the chat template check condition * Apply suggestions from code review change the chat-template check condition and some formatting issue Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * whitespace cleanup --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-05 10:28:58 +01:00
Zhiyong Wang	6b9a52422b	model: add Janus Pro for image understanding (#16906 ) * Add support for Janus Pro * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Address reviewer suggestions Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add JANUS_PRO constant * Update clip model handling Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Refactor JANUS_PRO handling in clip.cpp Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * Update tools/mtmd/clip.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * em whitespace --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2025-11-02 22:08:04 +01:00
Piotr Wilkin (ilintar)	0de0a01576	model : Minimax M2 (#16831 ) * Model: Minimax M2 * Cleanup * Cleanup pt. 2 * Cleanup pt. 3 * Update convert_hf_to_gguf_update.py - merge catch blocks Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Remove vocab models and test * Remove all redundant hparam settings covered by TextModel * Move super to start, don't set block_count * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-31 21:20:47 +01:00
JJJYmmm	d261223d24	model: add support for qwen3vl series (#16780 ) * support qwen3vl series. Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> * bugfix: fix the arch check for qwen3vl-moe. * use build_ffn * optimize deepstack structure * optimize deepstack feature saving * Revert "optimize deepstack feature saving" for temporal fix This reverts commit `f321b9fdf1`. * code clean * use fused qkv in clip * clean up / rm is_deepstack_layers for simplification * add test model * move test model to "big" section * fix imrope check * remove trailing whitespace * fix rope fail * metal : add imrope support * add imrope support for sycl * vulkan: add imrope w/o check * fix vulkan * webgpu: add imrope w/o check * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix tensor mapping --------- Co-authored-by: Thireus ☠ <Thireus@users.noreply.github.com> Co-authored-by: yairpatch <yairpatch@users.noreply.github.com> Co-authored-by: LETS-BEE <LETS-BEE@users.noreply.github.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-30 16:19:14 +01:00
Tianyue-Zhao	bacddc049a	model: Add support for CogVLM model (#15002 ) * Added GGUF mappings for CogVLM model * Add tensor mapping for CogVLM visual encoder * Add CogVLM to conversion script, no vision part yet * Added CogVLM vision model to conversion script * Add graph for CogVLM CLIP model * Add graph for CogVLM * Fixes for CogVLM. Now compiles. * Model now runs * Fixes for cogvlm graph * Account for graph context change after rebase * Changes for whitespace * Changes in convert script according to comments * Switch CogVLM LLM graph to merged QKV tensor * Use rope_type variable instead of direct definition * Change CogVLM CLIP encoder to use SWIGLU * Switch CogVLM CLIP to use merged QKV * Apply rebase edits and remove ggml_cont call that is now unnecessary * clean up --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-10-30 12:18:50 +01:00
Xuan-Son Nguyen	c55d53acec	model : add LightOnOCR-1B model (#16764 ) * model : add LightOnOCR-1B model * add test	2025-10-27 16:02:58 +01:00
Sigbjørn Skjæret	73a48c9790	convert : enable expert group selection for all models with it (#16691 )	2025-10-26 17:21:23 +01:00
Galunid	5d195f17bc	convert : handle mmproj filename/path properly (#16760 ) * convert: handle mmproj model output filename properly * remove redundant commits * Add model_type to gguf utility * Use mmproj- prefix instead of suffix * Apply CISC suggestion Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-25 20:41:36 +02:00
compilade	5cca2542ac	convert : avoid dequantizing mxfp4 for GPT-OSS (#16756 )	2025-10-24 20:52:00 -04:00
compilade	f8f071fadd	convert : handle pre-quantized models (#14810 ) * convert : begin handling pre-quantized models * convert : fix conversion from FP8 for Deepseek-V3.1-Base	2025-10-23 16:31:41 -04:00
Julien Denize	dd62dcfab9	convert : Make mistral-common dependency optional (#16738 ) * Make mistral-common dependency optional * Fix typing	2025-10-23 15:54:46 +02:00
Sigbjørn Skjæret	84bf3c6778	model : add BailingMoeV2 support (#16063 ) * add BailingMoeV2 support * update llm types * undo * undo * update llm types * add model collection link * update * almost working * correct group selection and rename n_group_exp * avoid large top_k and use argmax instead for now if we had something like argmax2 that would be equivalent, but this works fine until then * poke * skip group selection when there are no tokens * fix 1T conversion * hopefully fixed expert group selection third time's the charm? * make expert group selection generally available The new LLaDA2Moe model uses this method too, make it generally available regardless of architecture. * allow n_expert_groups to be 1 (Kimi K2) * address review suggestions	2025-10-20 21:38:20 +02:00

1 2 3 4 5 ...

272 Commits