llama.cpp

Commit Graph

Author	SHA1	Message	Date
Saba Fallah	51c3de6887	Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr # Conflicts: # gguf-py/gguf/constants.py # gguf-py/gguf/tensor_mapping.py # tools/mtmd/clip-impl.h # tools/mtmd/clip.cpp # tools/mtmd/models/models.h	2025-12-16 12:16:25 +01:00
Xuan-Son Nguyen	3d86c6c2b5	model: support GLM4V vision encoder (#18042 ) * convert ok * no deepstack * less new tensors * cgraph ok * add mrope for text model * faster patch merger * add GGML_ROPE_TYPE_MRNORM * add support for metal * move glm4v do dedicated graph * convert: add norm_embd * clip: add debugging fn * working correctly * fix style * use bicubic * fix mrope metal * improve cpu * convert to neox ordering on conversion * revert backend changes * force stop if using old weight * support moe variant * fix conversion * fix convert (2) * Update tools/mtmd/clip-graph.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * process mrope_section on TextModel base class * resolve conflict merge --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 11:25:26 +01:00
Saba Fallah	4a4f82968c	Merge branch 'ggml-org:master' into sf/deepseek-ocr	2025-12-16 09:09:52 +01:00
Daniel Bevenius	2995341730	llama : add support for NVIDIA Nemotron 3 Nano (#18058 ) * llama : add support for NVIDIA Nemotron Nano 3 This commit adds support for the NVIDIA Nemotron Nano 3 model, enabling the conversion and running of this model. Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-16 07:19:26 +01:00
Sigbjørn Skjæret	d6a1e18c65	convert : move rope_parameters to TextModel class (#18061 ) * make sure to search text_config for rope parameters * move rope_parameters to TextModel class	2025-12-15 22:03:16 +01:00
HelloKS	9d52f17ae3	model : add KORMo model (#18032 ) * vocab: add KORMo Tokenizer * model: add KORMoForCausalLM * vocab: change pretokenizer to qwen2 * lint: fix unintended line removal * model: make qwen2 bias tensor optional * model: use qwen2 architecture for KORMo	2025-12-15 18:51:43 +01:00
Saba Fallah	b3bf8cba05	Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr # Conflicts: # convert_hf_to_gguf.py	2025-12-15 10:19:50 +01:00
piDack	745fa0e78b	model : add glm-asr support (#17901 ) * [model] add glm-asr support * fix format for ci * fix convert format for ci * update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review * check root architecture for convert hf script * fix conficlt with upstream * fix convert script for glm asr & format clip-impl * format * restore hparams text * improved conversion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-15 03:18:46 +01:00
Sigbjørn Skjæret	5c8a717128	convert : refactor rope scaling handling (#18013 ) * refactor rope scaling handling * ws-- * missed a couple * use find_hparam	2025-12-14 16:04:37 +01:00
Saba Fallah	e0e69fd3fb	Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr-merge_#17965 # Conflicts: # src/llama-kv-cache.cpp # tools/mtmd/clip.cpp	2025-12-13 10:59:46 +01:00
Georgi Gerganov	7bed317f53	models : fix the attn_factor for mistral3 graphs + improve consistency (#17945 ) * models : fix the attn_factor for mistral3 graphs * cont : rework attn_factor correction logic * cont : make deepseek2 consistent * cont : add TODO * cont : special-case DSv2 * cont : revert Mistral 3 Large changes * cont : fix DS2 to use the original attn_factor * cont : minor comments	2025-12-12 17:12:40 +02:00
Saba Fallah	33fabf0bd8	Merge branch 'master' into sf/deepseek-ocr-merge-test # Conflicts: # tools/mtmd/clip.cpp # tools/mtmd/mtmd-cli.cpp	2025-12-11 08:13:50 +01:00
Xuan-Son Nguyen	9e79b0116e	convert: allow using quantized Mistral weight (#17889 ) * convert: allow using quantized Mistral weight * data_torch.ndim * update dequant fn Co-authored-by: compilade <compilade@users.noreply.github.com> --------- Co-authored-by: compilade <compilade@users.noreply.github.com>	2025-12-10 10:26:22 +01:00
philip-essential	1d2a1ab73d	model : support Rnj-1 (#17811 ) * add support for rnj1 * refactor gemma3 to support rnj-1 * address review comments	2025-12-09 04:49:03 +01:00
bluebread	48c6cf2132	mtmd: convert model in FP16	2025-12-08 02:36:00 +00:00
Xuan-Son Nguyen	dbc15a7967	convert: support Mistral 3 Large MoE (#17730 ) * convert: support Mistral 3 Large MoE * filter out vision tensors, add missing keys * handle vocab * add temperature_length * fix mscale_all_dim * clean up * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * fix * Update gguf-py/gguf/tensor_mapping.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-06 10:49:33 +01:00
bluebread	2d918b3e21	mtmd: make sam hparams configurable	2025-12-06 06:55:53 +00:00
bluebread	15f2ada0ed	mtmd: simplify get_rel_pos	2025-12-06 06:32:41 +00:00
Saba Fallah	1c88647ec6	fixed flake8 lint issues	2025-12-05 12:24:10 +01:00
Saba Fallah	5f2ee1aecf	Merge branch 'ggml-org:master' into sf/deepseek-ocr	2025-12-05 11:56:06 +01:00
bluebread	2dd9924076	Merge branch 'sf/deepseek-ocr-cleanup' of github.com:sfallah/llama.cpp into sf/deepseek-ocr-cleanup	2025-12-04 16:52:00 +00:00
bluebread	c89171cf4d	mtmd: fixed bad ocr check in Deepseek2 (LM)	2025-12-04 16:50:05 +00:00
Saba Fallah	386ba479a2	clean up	2025-12-04 15:05:58 +01:00
SmartestWashingMachine	3659aa28e9	convert: use existing local chat_template if mistral-format model has one. (#17749 ) * conversion: use existing local chat_template.jinja file if mistral-format model has one. * fix --mistral-format mistakenly assuming some <=v7 chat template names are file paths and reading them. * Update convert_hf_to_gguf.py - change from exists() to is_file() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-04 12:12:45 +01:00
Saba Fallah	66341666fb	Merge branch 'master' into sf/deepseek-ocr # Conflicts: # convert_hf_to_gguf.py # tools/mtmd/clip.h # tools/mtmd/mtmd.cpp	2025-12-02 21:02:13 +01:00
Xuan-Son Nguyen	2c453c6c77	convert: add error message for mistral3 quantized weight (#17686 )	2025-12-02 11:48:31 +01:00
Xuan-Son Nguyen	cd3c118908	model: support Ministral3 (#17644 ) * conversion script * support ministral 3 * maybe this is better? * add TODO for rope_yarn_log_mul * better ppl (tested on 14B-Instruct) * Add Ministral3 support to Mistral format * improve arch handling * add sizes * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * nits --------- Co-authored-by: Julien Denize <julien.denize@mistral.ai> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-12-01 12:26:52 +01:00
bluebread	55430945ef	Merge branch 'sf/deepseek-ocr' of github.com:sfallah/llama.cpp into sf/deepseek-ocr	2025-11-30 08:55:29 +00:00
Saba Fallah	ed3b7f1056	Merge remote-tracking branch 'sfallah/master' into sf/deepseek-ocr # Conflicts: # convert_hf_to_gguf.py # src/llama-model.cpp # src/models/deepseek2.cpp	2025-11-30 08:29:09 +01:00
bluebread	a488b495f7	mtmd: SAM numerically works	2025-11-29 02:17:49 +00:00
Piotr Wilkin (ilintar)	ff55414c42	model : Qwen3 Next (#16095 ) * Qwen3 Next - cleaned up version * Whitespaces and stuff * Correct minor errors * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Misc. fixes. * Clean up code, add missing hybrid qualifier * Did someone transpose the SOLVE_TRI result matrix? Perhaps... * Whitespace * Proper tensors for cb calls * Use llama-graph.h vertical alignment * BROKEN: chunking * Set new tensors as inputs. * Proper chunk logic * It's the circle of life... * More shenanigans for n_seq > 1 * Nail in the coffin? * Fix Windows build * Eh, one fails on Windows, the other fails on Mac... just use general capture. * quant : cleanup * model : cleanup * qwen3 : cleanup * cont : cleanup * cont : cleanup * ggml : revert change * qwen3 : cleanup * cont : cleanup * Readd cmath * qwen3 : fix typo * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Usual suspects * fix my bad suggestion --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 12:02:56 +01:00
Aleksei Nikiforov	05872ac885	convert : fix big-endian conversion (#17431 ) * Fix convert_hf_to_gguf.py script on s390x Assume converted model data is originally little-endian. Byteswap data on s390x after reading it to put values in correct presentation for any transformation needed, like calculating weight tensors. Then byteswap data to little-endian before passing it to GGUFWriter while GGUFWriter will byteswap data back to big endian if big endian output is requested. byteswap(inplace=True) calls don't work with lazy tensor and array wrappers. Use byteswap with copying data to workaround this behaviour. * Make GGUFWriter accept tensors in native endianness instead of little-endian With this change if no byteswapping is actually needed, 2 excessive byteswaps can be omitted on s390x * Fix byteswapping in convert_hf_to_gguf.py for remote models	2025-11-25 14:18:16 +01:00
Sigbjørn Skjæret	b61de2b2df	convert : allow quantizing lora again (#17453 )	2025-11-24 15:50:55 +01:00
william pan	4902eebe33	models : Added support for RND1 Diffusion Language Model (#17433 ) * Converted RND1 model to GGUF weights * RND1 llama.cpp support v1 * RND1 llama.cpp support v2 non causal bug * RND1 llama.cpp support v3 doccumentation * RND1 llama.cpp support v4 clean code * linting issues * RND1 pr fixes v1 * RND1 pr fixes v2 Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Diffusion documentation edits --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-24 14:16:56 +08:00
Sigbjørn Skjæret	07b0e7a5ac	convert : use self.block_count everywhere instead of reading hparams (#17359 )	2025-11-19 11:52:38 +01:00
Saba Fallah	1e08157134	clip-vit: model convert qkv_proj split	2025-11-17 21:19:51 +01:00
Saba Fallah	13dc6fb305	Merge branch 'sf/deepseek-ocr' into sf/deepseek-ocr	2025-11-17 11:25:16 +01:00
Saba Fallah	97e0907c5b	loading LM testing Vision model loading	2025-11-17 11:07:33 +01:00
bluebread	76305878d5	mtmd: successfully runs DeepSeek-OCR LM in llama-cli	2025-11-16 08:45:08 +00:00
Sigbjørn Skjæret	662192e1dc	convert : remove unnecessary chat template patching (#17289 )	2025-11-15 20:58:59 +01:00
bluebread	eab28ed318	mtmd: add DeepSeek-OCR LM support with standard attention	2025-11-15 17:28:18 +00:00
Sigbjørn Skjæret	9a8860cf5d	convert : use all parts in safetensors index (#17286 )	2025-11-15 14:12:39 +01:00
Sigbjørn Skjæret	9d3ef4809f	convert : set expert gating func in base class (#17279 )	2025-11-15 14:06:24 +01:00
bluebread	85c7cda8eb	mtmd: fix vision model processing	2025-11-15 04:20:01 +00:00
Bartowski	e1fcf8b09b	model : add AfmoeForCausalLM support (#16477 ) * Add AFMOE model support * Update to vocab * Add model sizing * Undo Rope change for ARCEE model * Address review comments * Update modeling code is_sliding -> use_rope, replace hard-coded logic * Fix AFMOE tokenizer * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update AFMoE tokenizer class identification to be more unique --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-14 13:54:10 +01:00
Saba Fallah	43a130b4d0	mtmd: llama.cpp DeepSeekOCR support init commit	2025-11-14 12:40:20 +01:00
levkropp	2fc392ce35	convert : register UMT5Model architecture for T5 conversion (#17160 ) Register UMT5Model as a supported architecture variant for T5 model conversion. This allows the conversion to work for models downloaded with AutoModel.	2025-11-11 09:38:30 +01:00
compilade	802cef44bf	convert : parse safetensors directly (#15667 ) * convert : parse safetensors directly * gguf-py : order safetensors tensors by name Applies to both local and remote safetensors custom parsing. This matches the behavior of the official safetensors implementation. * convert : rename from_safetensors_meta to from_local_tensor For consistency with from_remote_tensor * convert : fix no-lazy dtypes from direct safetensors	2025-11-09 09:49:40 -05:00
compilade	1c07c0c68c	convert : handle compressed-tensors quant method (#17069 ) * convert : handle compressed-tensors quant method * convert : handle int-quantized models * convert : handle naive-quantized models * gguf-py : __pos__ is also unary * convert : fix flake8 lint * convert : use F32 for dequant of pack-quantized tensors	2025-11-09 09:45:50 -05:00
Li Pengzhan	9f052478c2	model : add openPangu-Embedded (#16941 ) * Model: add openPangu-Embedded * fixed according to reviewer's comments * fixed the chat template check condition * Apply suggestions from code review change the chat-template check condition and some formatting issue Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * whitespace cleanup --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-05 10:28:58 +01:00

1 2 3 4 5 ...

283 Commits