Zijun Yu
15cbe21cd0
Merge 76775a5b8e into 0ccbfdef3e
2026-02-14 00:45:59 +00:00
TriDefender
313493de53
docs : update path in snapdragon README.md ( #19533 )
...
paths changed so original example didn't work
2026-02-12 08:13:51 +01:00
Yu, Zijun
7b3b65b04e
Merge branch 'master' into dev_backend_openvino
2026-02-11 10:26:58 +08:00
Sascha Rogmann
292f6908cd
spec : remove check rate ( #19377 )
...
* spec: remove parameter spec-ngram-check-rate
* spec : renamed statistics vars
* spec : add n_call_begin, n_call_accept
* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Kevin Pouget
f5e7734ff2
ggml-virtgpu: add backend documentation ( #19354 )
...
* ggml-virtgpu: add backend documentation
Assisted-by-AI: Claude Code
* CODEOWNERS: add /docs/backend/GGML-VirtGPU/ -> kpouget
* README: add the link to docs/backend/GGML-VirtGPU/ggml-virt.md
* docs/ggml-virt: add link to testing + configuration
* Revert "CODEOWNERS: add /docs/backend/GGML-VirtGPU/ -> kpouget"
This reverts commit 8ece8e72e2 .
* drop the ggml- prefix
* s/ggerganov/ggml-org
* Relocate VirtGPU.md
* reorganize the text
* turn turn the ascii diagram into a mermaid
* README.md: update the link to the main doc
2026-02-09 20:15:42 +08:00
Nechama Krashinski
537eadb1b9
sycl: add F16 support for GGML_OP_CEIL ( #19306 )
...
* Fix SYCL CEIL operator
* sycl: implement GGML_OP_CEIL
2026-02-06 23:13:44 +08:00
Gaurav Garg
41e3f02647
cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until investigated ( #19227 )
...
Hangs were reported on Jetson Orin AGX if we set CUDA_SCALE_LAUNCH_QUEUES=4x. Reverting the previous PR (#19042 ) and updating the document to consider setting CUDA_SCALE_LAUNCH_QUEUES=4x for faster throughput on multi-GPU systems.
2026-02-03 08:41:02 +02:00
Neo Zhang
bf38346d13
Remove support for Nvidia & AMD GPU, because the oneAPI plugin for Nvidia & AMD GPU is unavailable: download/installation channels are out of work. ( #19246 )
...
User can't build up the software for Nvidia & AMD GPU.
rm the oneMath since it is only used in NV and AMD code path.
2026-02-02 21:06:21 +08:00
Tamar
4d5e972673
sycl: implement GGML_OP_TOP_K ( #19242 )
2026-02-02 21:05:51 +08:00
Christian Kastner
7a4ca3cbd9
docs : Minor cleanups ( #19252 )
...
* Update old URLs to github.com/ggml-org/
* Bump copyrights
2026-02-02 08:38:55 +02:00
Sascha Rogmann
b4d05a3d2f
spec : various improvements ton ngram-map + docs ( #19253 )
...
* spec: ngram-map and reasoning chats
* spec: add t_begin and t_accept
* ngram-map : add internal hash map
* docs : update ngram-map, add ngram-mod
* docs : fix ngram-map-k
* docs : differences between implementations
2026-02-02 08:26:58 +02:00
Max Krasnyansky
3bc8d2cf23
Bump cmake max version (needed for Windows on Snapdragon builds) ( #19188 )
...
* Bump max cmake version (needed for Windows on Snapdragon builds)
* cmake: move max version setting into ggml/CMakeLists
2026-02-01 14:13:38 -08:00
Neo Zhang
2634ed207a
create test.sh to enhance the parameters for testing, update the guide, rm useless script ( #19243 )
2026-02-01 18:24:00 +08:00
s8322
1025fd2c09
sycl: implement GGML_UNARY_OP_SOFTPLUS ( #19114 )
...
* sycl: add softplus unary op implementation
* sycl: add softplus unary op implementation
* docs(ops): mark SYCL SOFTPLUS as supported
* docs: update SYCL status for SOFTPLUS
2026-01-30 12:01:38 +08:00
RachelMantel
c7358ddf64
sycl: implement GGML_OP_TRI ( #19089 )
...
* sycl: implement GGML_OP_TRI
* docs: update ops.md for SYCL TRI
* docs: regenerate ops.md
* docs: update SYCL support for GGML_OP_TRI
2026-01-30 12:00:49 +08:00
DDXDB
d284baf1b5
Fix typos in SYCL documentation ( #19162 )
...
* Fix typos in SYCL documentation
* Update SYCL.md
* Update SYCL.md
* Update SYCL.md
* Update docs/backend/SYCL.md
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
* Update SYCL.md
---------
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2026-01-30 09:46:57 +08:00
Todor Boinovski
ce38a4db47
hexagon: enable offloading to Hexagon on Windows on Snapdragon ( #19150 )
...
* hexagon: updates to enable offloading to HTP on WoS
* Update windows.md
* Update windows.md
* hexagon: enable -O3 optimizations
* hexagon: move all _WINDOWS conditional compilation to _WIN32
* hexagon: updates to enable offloading to HTP on WoS
* hexagon: use run-time vs load-time dynamic linking for cdsp driver interface
* refactor htp-drv
* hexagon: add run-bench.ps1 script
* hexagon: htdrv refactor
* hexagon: unify Android and Windows build readmes
* hexagon: update README.md
* hexagon: refactor htpdrv
* hexagon: drv refactor
* hexagon: more drv refactor
* hexagon: fixes for android builds
* hexagon: factor out dl into ggml-backend-dl
* hexagon: add run-tool.ps1 script
* hexagon: merge htp-utils in htp-drv and remove unused code
* wos: no need for getopt_custom.h
* wos: add missing CR in htpdrv
* hexagon: ndev enforecement applies only to the Android devices
* hexagon: add support for generating and signing .cat file
* hexagon: add .inf file
* hexagon: working auto-signing and improved windows builds
* hexagon: futher improve skel build
* hexagon: add rough WoS guide
* hexagon: updated windows guide
* hexagon: improve cmake handling of certs and logging
* hexagon: improve windows setup/build doc
* hexagon: more windows readme updates
* hexagon: windows readme updates
* hexagon: windows readme updates
* hexagon: windows readme updates
* hexagon: windows readme updates
* Update windows.md
* Update windows.md
* snapdragon: rename docs/backend/hexagon to docs/backends/snapdragon
Also added a power shell script to simplify build env setup.
* hexagon: remove trailing whitespace and move cmake requirement to user-presets
* hexagon: fix CMakeUserPresets path in workflow yaml
* hexagon: introduce local version of libdl.h
* hexagon: fix src1 reuse logic
gpt-oss needs a bigger lookahead window.
The check for src[1] itself being quantized was wrong.
---------
Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com>
2026-01-29 12:33:21 -08:00
Neo Zhang
d4964a7c66
sycl: fix norm kernels: l2_norm, group_norm, rms_norm by remove assert to support more cases ( #19154 )
...
Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>
2026-01-29 09:20:22 +08:00
Sascha Rogmann
72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor ( #18471 )
...
* server: introduce self-speculative decoding
* server: moved self-call into speculative.cpp
* can_speculate() includes self-speculation
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: can_speculate() tests self-spec
* server: replace can_speculate() with slot.can_speculate()
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* common: use %zu format specifier for size_t in logging
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* server: can_speculate() requires a task instance
* common: ngram map, config self-speculative decoding
* common: add enum common_speculative_type
* common: add vector of speculative states
* common: add option --spec-draftless
* server: cleanup (remove slot.batch_spec, rename)
* common: moved self-spec impl to ngram-map
* common: cleanup (use common_speculative_state_draft)
* spec : refactor
* cont : naming
* spec: remove --spec-config
* doc: (draftless) speculative decoding
* common: print performance in spec decoding
* minor : cleanup
* common : better names
* minor : cleanup + fix build
* minor: comments
* CODEOWNERS: add common/ngram-map.* (#18471 )
* common : rename speculative.draftless_type -> speculative.type
* ngram-map : fix uninitialized values
* ngram-map : take into account the input can become shorter
* ngram-map : revert len check for now
* arg : change `--spec-draftless` -> `--spec-type`
* spec : add common_speculative_state::accept()
* spec : refactor + add common_speculative_begin()
* spec : fix begin() call with mtmd
* spec : additional refactor + remove common_speculative_params
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00
Ben Chen
0a95026da9
doc: add build instruction to use Vulkan backend on macos ( #19029 )
2026-01-28 12:30:16 +01:00
David Lima
68ac3acb43
docs: Remove duplicated word on CUDA build section ( #19136 )
2026-01-27 14:48:51 +01:00
Gaurav Garg
a83c73a18a
[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full ( #19042 )
...
* [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full
With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline.
Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size.
* Set the env variable in the CUDA backend registry allocation
* Add link to PR in code comment
* Remove warning logs and update documentation
2026-01-27 08:52:44 +02:00
Francisco Herrera
293a1565dc
docs: add linux to index ( #18907 )
2026-01-18 18:03:35 +08:00
Reese Levine
a89002f07b
ggml webgpu: support for backend sampling ( #18880 )
...
* ggml webgpu: add SOFTPLUS unary operator
Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32
precision for intermediate calculations to prevent f16 overflow.
* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* Follow Vulkan backend numerical stability pattern
* ggml webgpu: add EXPM1 unary operator
Implements EXPM1 (exp(x) - 1) with f16/f32 support.
* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* ggml webgpu: add FLOOR unary operator
Implements FLOOR (rounds down to nearest integer) with f16/f32 support.
* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* ggml webgpu: add CEIL unary operator
Implements CEIL (rounds up to nearest integer) with f16/f32 support.
* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* ggml webgpu: add ROUND unary operator
Implements ROUND (rounds to nearest integer) with f16/f32 support.
* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* ggml webgpu: add TRUNC unary operator
Implements TRUNC (truncates towards zero) with f16/f32 support.
* Add shader implementation and 4 variants (f32/f16, inplace/non-inplace)
* Register pipelines and device support
* docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS)
* Updates to webgpu get_memory
* Add argmax
* Add argmax,cumsum,sum,sum_rows
* Add necessary CPY/GET_ROWS operators
* Support for argsort using multi-pass strategy
* Update set_rows for i32 indices, move to pre-wgsl
* Port unary operators to pre-wgsl and support FILL
* Implement PAD
* Add support for top-k
* clean up, scope pipeline init mutex
* fix newline
* Add support for log
* Update LOG for better precision, and ops doc
---------
Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>
2026-01-16 16:12:43 -08:00
hipudding
6ba6a3c76f
docs : update ops.md for CANN backend ( #18654 )
2026-01-16 13:32:17 +01:00
Xuan-Son Nguyen
c15395f73c
common : implement new jinja template engine ( #18462 )
...
* jinja vm
* lexer
* add vm types
* demo
* clean up
* parser ok
* binary_expression::execute
* shadow naming
* bin ops works!
* fix map object
* add string builtins
* add more builtins
* wip
* use mk_val
* eval with is_user_input
* render gemma tmpl ok
* track input string even after transformations
* support binded functions
* keyword arguments and slicing array
* use shared_ptr for values
* add mk_stmt
* allow print source on exception
* fix negate test
* testing more templates
* mostly works
* add filter_statement
* allow func to access ctx
* add jinja-value.cpp
* impl global_from_json
* a lot of fixes
* more tests
* more fix, more tests
* more fixes
* rm workarounds
* demo: type inferrence
* add placeholder for tojson
* improve function args handling
* rm type inference
* no more std::regex
* trailing spaces
* make testing more flexible
* make output a bit cleaner
* (wip) redirect minja calls
* test: add --output
* fix crash on macro kwargs
* add minimal caps system
* add some workarounds
* rm caps_apply_workarounds
* get rid of preprocessing
* more fixes
* fix test-chat-template
* move test-chat-jinja into test-chat-template
* rm test-chat-jinja from cmake
* test-chat-template: use common
* fix build
* fix build (2)
* rename vm --> interpreter
* improve error reporting
* correct lstrip behavior
* add tojson
* more fixes
* disable tests for COMMON_CHAT_FORMAT_GENERIC
* make sure tojson output correct order
* add object.length
* fully functional selectattr / rejectattr
* improve error reporting
* more builtins added, more fixes
* create jinja rendering tests
* fix testing.h path
* adjust whitespace rules
* more fixes
* temporary disable test for ibm-granite
* r/lstrip behavior matched with hf.js
* minimax, glm4.5 ok
* add append and pop
* kimi-k2 ok
* test-chat passed
* fix lstrip_block
* add more jinja tests
* cast to unsigned char
* allow dict key to be numeric
* nemotron: rm windows newline
* tests ok
* fix test
* rename interpreter --> runtime
* fix build
* add more checks
* bring back generic format support
* fix Apertus
* [json.exception.out_of_range.403] key 'content' not found
* rm generic test
* refactor input marking
* add docs
* fix windows build
* clarify error message
* improved tests
* split/rsplit with maxsplit
* non-inverse maxsplit
forgot to change after simplifying
* implement separators for tojson and fix indent
* i like to move it move it
* rename null -- > none
* token::eof
* some nits + comments
* add exception classes for lexer and parser
* null -> none
* rename global -> env
* rm minja
* update docs
* docs: add input marking caveats
* imlement missing jinja-tests functions
* oops
* support trim filter with args, remove bogus to_json reference
* numerous argument fixes
* updated tests
* implement optional strip chars parameter
* use new chars parameter
* float filter also has default
* always leave at least one decimal in float string
* jinja : static analysis + header cleanup + minor fixes
* add fuzz test
* add string.cpp
* fix chat_template_kwargs
* nits
* fix build
* revert
* unrevert
sorry :)
* add fuzz func_args, refactor to be safer
* fix array.map()
* loosen ensure_vals max count condition, add not impl for map(int)
* hopefully fix windows
* check if empty first
* normalize newlines
---------
Co-authored-by: Alde Rojas <hello@alde.dev>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2026-01-16 11:22:06 +01:00
Yamini Nimmagadda
d3649c11cb
Update OPENVINO.md
2026-01-15 11:39:08 -08:00
Yamini Nimmagadda
e9ed5c4cb6
Update OPENVINO.md
2026-01-15 11:39:08 -08:00
Yamini Nimmagadda
f44c60e995
Update OPENVINO.md
2026-01-15 11:39:08 -08:00
Yamini Nimmagadda
63eed0d9f3
Update build.md
2026-01-15 11:39:08 -08:00
Yamini Nimmagadda
61552e4450
Update OPENVINO.md
2026-01-15 11:39:08 -08:00
Yamini Nimmagadda
9ba324726a
Update OPENVINO.md
2026-01-15 11:39:08 -08:00
Yamini Nimmagadda
25e652569b
Update OPENVINO.md
2026-01-15 11:39:08 -08:00
Yamini Nimmagadda
416556a87d
Create OPENVINO.md in llama.cpp backend docs
2026-01-15 11:39:08 -08:00
Yu, Zijun
f5c71e3cf4
Update build.md
2026-01-15 11:39:08 -08:00
Yu, Zijun
52a44012c0
Update build.md to include OpenCL
2026-01-15 11:39:08 -08:00
Arshath
ae5336386f
Update build.md for Windows
2026-01-15 11:39:08 -08:00
Ravi Panchumarthy
3a1129e073
Update OV dockerfile to use OV2025.3 and update build docs
2026-01-15 11:27:30 -08:00
Ravi Panchumarthy
841d673bd0
Update to OV-2025.3 and CMakeLists.txt
2026-01-15 11:26:00 -08:00
Yu, Zijun
6ab76ed10a
Fix accuracy: disable cpu_repack
2026-01-15 11:19:15 -08:00
Yu, Zijun
a7b611bc93
Minor updates for raising PR
2026-01-15 11:19:15 -08:00
Ravi Panchumarthy
2f99135ccc
Update build.md
2026-01-15 10:26:28 -08:00
ravi9
ea75772e48
Added OpenVINO CI/CD. Updated docs
2026-01-15 10:26:25 -08:00
Ravi Panchumarthy
3051d5ae07
Update openvino build instructions
2026-01-15 10:20:18 -08:00
Yu, Zijun
fd324366d0
Update build doc
2026-01-15 10:20:18 -08:00
Yu, Zijun
0d009fe61a
FEAT: Add all conversion code from ov side
2026-01-15 10:10:00 -08:00
Viraj Wadhwa
ffabe95e2a
Rebase - Bring up to date and fix build process
2026-01-15 10:09:23 -08:00
Max Krasnyansky
cff777f226
hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations ( #18822 )
...
* hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars
* hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32
Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication.
Update HTP ops infra to support OP_CPY
* hexagon: cleanup and refactor hex/hvx/htp headers and helper libs
hex is basically all scalar/core platform stuff (L2, DMA, basic utils)
hvx is all hvx related utils, helpers, etc
htp is higher level stuff like Ops, etc
hvx-utils library got a nice round of cleanup and refactoring to reduce duplication
use hvx_vec_store_a where possible
* hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h
Moved sigmoid and tanh vector functions from hvx-utils.h to a new header
hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid
array processing using a macro pattern similar to hvx-copy.h. Updated
act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed
unused hvx-sigmoid.c.
* hexagon: factor out hvx-sqrt.h
* hexagon: mintor update to hvx-utils.h
* hexagon: remove spurios log
* hexagon: factor out and optimize hvx_add/sub/mul
* hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions
* hexagon: refactor reduction functions to hvx-reduce.h
Moved `hvx_self_max_f32` and `hvx_self_sum_f32` from `hvx-utils.h`/`.c` to `hvx-reduce.h`.
Renamed them to `hvx_reduce_max_f32` and `hvx_reduce_sum_f32`.
Added aligned (`_a`) and unaligned (`_u`) variants and used macros to unify logic.
Updated `softmax-ops.c` to use the new functions.
* hexagon: refactor the rest of arithmetic functions to hvx-arith.h
Moved `hvx_sum_of_squares_f32`, `hvx_min_scalar_f32`, and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` to use `dst, src, ..., n` argument order. Updated call sites in `act-ops.c`.
Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h
Moved `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated these functions to use `dst, src, ..., n` argument order and updated call sites in `act-ops.c`. `hvx_sum_of_squares_f32` remains in `hvx-utils.c` as requested.
* hexagon: refactor hvx_sum_of_squares_f32
- Modify `hvx_sum_of_squares_f32` in `ggml/src/ggml-hexagon/htp/hvx-reduce.h` to use `dst, src` signature.
- Implement `_a` (aligned) and `_u` (unaligned) variants for `hvx_sum_of_squares_f32`.
- Update `hvx_reduce_loop_body` macro to support both returning and storing results via `finalize_op`.
- Update existing reduction functions in `hvx-reduce.h` to use the updated macro.
- Update `rms_norm_htp_f32` in `ggml/src/ggml-hexagon/htp/unary-ops.c` to match the new signature.
* hexagon: use hvx_splat instead of memset
* hexagon: consistent use of f32/f16 in all function names to match the rest of GGML
* hexagon: fix hvx_copy_f16_f32 on v75 and older
* hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL
* scripts: update snapdragon/adb scripts to enable host param
2026-01-14 21:46:12 -08:00
Piotr Wilkin (ilintar)
d98b548120
Restore clip's cb() to its rightful glory - extract common debugging elements in llama ( #17914 )
...
* Extract common debugging functions; plug eval-callback and mtmd's MTMD_DEBUG_GRAPH with same functionality
* Move to common
* Remove unneeded header
* Unlink from common
* chore: update webui build output
* Cleanup; properly pass params to mtmd without depending on common; factorize debug.cpp to use common debug code.
* Revert change to webapp
* Post-merge adjust
* Apply suggestions from code review
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Apply code review changes
* Remove changes to server-context
* Remove mtmd.h include
* Remove utility functions from header
* Apply suggestions from code review
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Rename functions
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* Update tools/mtmd/clip.cpp
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2026-01-14 20:29:35 +01:00
Adrien Gallouët
516a4ca9b5
refactor : remove libcurl, use OpenSSL when available ( #18828 )
2026-01-14 18:02:47 +01:00