llama.cpp

Commit Graph

Author	SHA1	Message	Date
suhyun-hwang	0bffc5c960	refactor: remove redundant int() cast in sliding_window_pattern	2026-01-15 21:08:02 +09:00
suhyun-hwang	bfc92e954b	refactor: remove rope_parameters from VaetkiModel	2026-01-15 21:03:05 +09:00
suhyun-hwang	75323b3e08	refactor: simplify sliding window pattern handling in VaetkiModel	2026-01-15 21:03:05 +09:00
suhyun-hwang	5d0870207a	refactor: clean up VaetkiModel class	2026-01-15 21:03:05 +09:00
suhyun-hwang	ad04d34047	chore: remove some missed traces	2026-01-15 21:03:05 +09:00
suhyun-hwang	56c89a1216	add: VAETKI tokenizer implementation	2026-01-15 21:03:05 +09:00
suhyun-hwang	ca85717886	revert: remove VAETKI tokenizer implementation	2026-01-15 21:03:05 +09:00
suhyun-hwang	487909ae0e	refactor: simplify VaetkiModel set_gguf_parameters	2026-01-15 20:59:11 +09:00
suhyun-hwang	ab233049dc	refactor: remove redundant deepseek2 compatibility code	2026-01-15 20:59:11 +09:00
Xuan Son Nguyen	d85a08830b	use min_pixels/max_pixels from preproc config	2026-01-15 20:59:11 +09:00
Xuan Son Nguyen	89db71702b	add min/max pixels gguf metadata	2026-01-15 20:59:11 +09:00
suhyun-hwang	8bbeab0616	fix: use tensor stride for fused QKV support in vaetki	2026-01-15 20:59:11 +09:00
suhyun-hwang	c947c74a4c	fix: restore QKV splitting for VAETKI (fused QKV not working)	2026-01-15 20:59:11 +09:00
suhyun-hwang	808642295b	fix: use mm_ffn_down_w for VAETKI projector embedding size	2026-01-15 20:59:11 +09:00
suhyun-hwang	c13747b93d	refactor: remove manual QKV splitting (handled by build_vit)	2026-01-15 20:59:11 +09:00
suhyun-hwang	566128ffb7	refactor: use standard tensor naming for VAETKI projector	2026-01-15 20:59:11 +09:00
suhyun-hwang	8657eceda5	fix: use num_patches instead of n_pos for position array size	2026-01-15 20:59:11 +09:00
suhyun-hwang	c9e44c7451	style: add whitespace around arithmetic operators	2026-01-15 20:59:11 +09:00
suhyun-hwang	025ce711b6	Add VaetkiVisionModel mmproj converter with Rice ViT support	2026-01-15 20:59:11 +09:00
suhyun-hwang	96294c6ad9	refactor: simplify partial RoPE with weight reordering	2026-01-15 20:59:11 +09:00
suhyun-hwang	db84faff3a	fix: correct VAETKI model type naming	2026-01-15 20:59:11 +09:00
suhyun-hwang	5d08f3e87b	feat: VAETKI dynamic image size support	2026-01-15 20:59:11 +09:00
suhyun-hwang	d61a3f817c	refactor: use build_vit for VAETKI vision encoder	2026-01-15 20:59:11 +09:00
suhyun-hwang	9d531ea9d5	refactor: move class_pos_emb to VAETKI case	2026-01-15 20:59:11 +09:00
suhyun-hwang	c5e9eac8c5	refactor: merge VAETKI positions case with QWEN2VL	2026-01-15 20:59:11 +09:00
suhyun-hwang	d8e8b77c44	fix: add VAETKI pre-tokenizer hash	2026-01-15 20:59:11 +09:00
suhyun-hwang	4358557fe7	fix: sliding_window_pattern type error	2026-01-15 20:58:38 +09:00
suhyun-hwang	b267aada03	mtmd : add VAETKI vision encoder support	2026-01-15 20:58:38 +09:00
suhyun-hwang	488cdee96f	model : add VAETKI architecture support	2026-01-15 20:58:04 +09:00
shalinib-ibm	8cc0ba957b	ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (#18837 )	2026-01-15 17:31:18 +08:00
Xuan-Son Nguyen	a7e6ddb8bd	lora: make sure model keep track of associated adapters (#18490 ) * lora: make sure model keep track of associated adapters * deprecate llama_adapter_lora_free * minor : std::unordered_set over std::set --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-15 10:24:28 +01:00
Sigbjørn Skjæret	2a13180100	model-loader : support bool array sliding window pattern (#18850 )	2026-01-15 10:12:46 +01:00
Adrien Gallouët	ec997b4f2b	tests : download models only when running ctest (#18843 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-15 09:47:29 +01:00
Max Krasnyansky	cff777f226	hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822 ) * hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars * hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32 Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication. Update HTP ops infra to support OP_CPY * hexagon: cleanup and refactor hex/hvx/htp headers and helper libs hex is basically all scalar/core platform stuff (L2, DMA, basic utils) hvx is all hvx related utils, helpers, etc htp is higher level stuff like Ops, etc hvx-utils library got a nice round of cleanup and refactoring to reduce duplication use hvx_vec_store_a where possible * hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h Moved sigmoid and tanh vector functions from hvx-utils.h to a new header hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid array processing using a macro pattern similar to hvx-copy.h. Updated act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed unused hvx-sigmoid.c. * hexagon: factor out hvx-sqrt.h * hexagon: mintor update to hvx-utils.h * hexagon: remove spurios log * hexagon: factor out and optimize hvx_add/sub/mul * hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions * hexagon: refactor reduction functions to hvx-reduce.h Moved `hvx_self_max_f32` and `hvx_self_sum_f32` from `hvx-utils.h`/`.c` to `hvx-reduce.h`. Renamed them to `hvx_reduce_max_f32` and `hvx_reduce_sum_f32`. Added aligned (`_a`) and unaligned (`_u`) variants and used macros to unify logic. Updated `softmax-ops.c` to use the new functions. * hexagon: refactor the rest of arithmetic functions to hvx-arith.h Moved `hvx_sum_of_squares_f32`, `hvx_min_scalar_f32`, and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` to use `dst, src, ..., n` argument order. Updated call sites in `act-ops.c`. Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h Moved `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated these functions to use `dst, src, ..., n` argument order and updated call sites in `act-ops.c`. `hvx_sum_of_squares_f32` remains in `hvx-utils.c` as requested. * hexagon: refactor hvx_sum_of_squares_f32 - Modify `hvx_sum_of_squares_f32` in `ggml/src/ggml-hexagon/htp/hvx-reduce.h` to use `dst, src` signature. - Implement `_a` (aligned) and `_u` (unaligned) variants for `hvx_sum_of_squares_f32`. - Update `hvx_reduce_loop_body` macro to support both returning and storing results via `finalize_op`. - Update existing reduction functions in `hvx-reduce.h` to use the updated macro. - Update `rms_norm_htp_f32` in `ggml/src/ggml-hexagon/htp/unary-ops.c` to match the new signature. * hexagon: use hvx_splat instead of memset * hexagon: consistent use of f32/f16 in all function names to match the rest of GGML * hexagon: fix hvx_copy_f16_f32 on v75 and older * hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL * scripts: update snapdragon/adb scripts to enable host param	2026-01-14 21:46:12 -08:00
Oliver Simons	36f0132464	CUDA: Factor out and re-use `block_reduce` function (#18785 ) * CUDA: Refactor and expose two_stage_warp_reduce_* function * Use `two_stage_warp_reduce` also in softmax kernel, move smem out of it Moving smem out of `__device__` function to `__global__` function allows for explicit smem reuse, as either compiler or cuda rt seem to not free it afterwards (`cudaFuncSetAttribute` fails when not accounting for it once for each call to two_stage_warp_reduce) * Update ggml/src/ggml-cuda/common.cuh Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Use two_stage_warp_reduce in group_norm_f32 * Use two_stage_warp_reduce in rms_norm_f32 * Fix smem calculation which expects bytes * Make `two_stage_warp_reduce` accept all values warp_reduce accepts Also integrate it into norm_f32 function * Use two_stage_warp_reduce in l2_norm_f32 * Use type traits for block reduction for better legibility Also adresss other requests by @am17an such as variable renaming * Make norm tests cover all cuda paths * Mark columns % WARP_SIZE !=0 as supported for RMS_NORM_BACK Unit-tests passed locally, let's see if they pass in the CI as well * Use `enum class` for `block_reduce_method` This is more type-safe than plain enum * Rename variables as suggested in code review by @am17an * Rename two_stage_warp_reduce -> block_reduce * Fix trailing whitespace in common.cuh * Make condition of static_assert type-dependent This delays evaluation until the template is actually instantiated. Otherwise, some compilers may evaluate the assert when parsing the template, resulting in build errors as observed here: https://github.com/ggml-org/llama.cpp/actions/runs/20960323123/job/60235530068?pr=18785 * Inline definitions --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>	2026-01-15 10:44:54 +08:00
Piotr Wilkin (ilintar)	d98b548120	Restore clip's cb() to its rightful glory - extract common debugging elements in llama (#17914 ) * Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality * Move to common * Remove unneeded header * Unlink from common * chore: update webui build output * Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code. * Revert change to webapp * Post-merge adjust * Apply suggestions from code review Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Apply code review changes * Remove changes to server-context * Remove mtmd.h include * Remove utility functions from header * Apply suggestions from code review Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Rename functions * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> * Update tools/mtmd/clip.cpp Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com> --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>	2026-01-14 20:29:35 +01:00
Junwon Hwang	8fb7175576	model : clean up and fix EXAONE-MoE configuration (#18840 ) * Fix mismatch of EXAONE-MoE configuration * ensure gating func is set, cleanup --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-14 19:38:21 +01:00
Adrien Gallouët	516a4ca9b5	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
Jeff Bolz	3e4bb29666	vulkan: Check maxStorageBufferRange in supports_op (#18709 ) * vulkan: Check maxStorageBufferRange in supports_op * skip maxStorageBufferRange check when shader64BitIndexing is enabled	2026-01-14 10:59:05 +01:00
Aman Gupta	47f9612492	llama-model: fix unfortunate typo (#18832 )	2026-01-14 17:55:15 +08:00
Daniel Bevenius	01cbdfd7eb	CUDA : fix typo in clang pragma comment [no ci] (#18830 )	2026-01-14 10:31:49 +01:00
Ruben Ortlam	635ef78ec5	vulkan: work around Intel fp16 bug in mmq (#18814 )	2026-01-14 09:41:23 +01:00
Perry Naseck	7d587e5544	ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705 )	2026-01-14 09:22:25 +02:00
Daniel Benjaminsson	d34aa07193	mmap: add Haiku support by skipping RLIMIT_MEMLOCK check (#18819 ) Haiku OS does not support RLIMIT_MEMLOCK, similar to visionOS/tvOS. Skip the resource limit check on Haiku to allow mlock functionality to work without compile errors. Tested on Haiku with NVIDIA RTX 3080 Ti using Vulkan backend.	2026-01-14 09:11:05 +02:00
Adrien Gallouët	f709c7a33f	ci, tests : use cmake to download models and remove libcurl dependency (#18791 ) * ci, tests : use cmake to download models and remove libcurl dependency * llama_dl_model -> llama_download_model * use EXPECTED_HASH for robust model downloading * Move llama_download_model to cmake/common.cmake Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-01-14 07:46:27 +01:00
ddh0	6e36299b47	llama : print_info alignment fix (#18708 ) * fix text spacing in print_info * align all	2026-01-14 00:05:11 +01:00
Junwon Hwang	60591f01d4	model : add EXAONE MoE (#18543 ) * Add EXAONE MoE implementations Co-authored-by: Junwon Hwang <nuclear1221@gmail.com> * Address PR feedback * Address PR feedback * [WIP] Add MTP for EXAONE-MoE * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback * Address PR feedback --------- Co-authored-by: LG-AI-EXAONE <exaonemodels@lgresearch.ai>	2026-01-13 23:28:38 +01:00
Georgi Gerganov	e4832e3ae4	vocab : fix attribute overrides for harmony (#18806 ) * vocab : fix attribute overrides for harmony * cont : add warning log	2026-01-13 17:40:13 +02:00
Ruben Ortlam	960e5e3b46	llama-mmap: fix direct-io loading fallback EOF exception (#18801 )	2026-01-13 15:57:07 +01:00
Daniel Bevenius	20ca2e12c4	model-conversion : remove -c 0 from model card template [no ci] (#18807 ) This commit removes the `-c, --ctx-size N` from the llama-server command in the model card template for causal models. The motivation for this is that -c 0 is the default and specifying it is redundant.	2026-01-13 14:13:10 +01:00

1 2 3 4 5 ...

7773 Commits All Branches Search

7773 Commits

All Branches