* ggml-cpu: add ggml_thread_cpu_relax with Zihintpause support
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
* cmake: enable RISC-V zihintpause extension for Spacemit builds
* readme : add ZIHINTPAUSE support for RISC-V
---------
Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>
* common : implement parser combinators to simplify chat parsing
* add virtual destructor to parser_base
* fix memory leak from circular references of rules
* implement gbnf grammar building
* remove unused private variable
* create a base visitor and implement id assignment as a visitor
* fix const ref for grammar builder
* clean up types, friend classes, and class declarations
* remove builder usage from until_parser
* Use a counter class to help assign rule ids
* cache everything
* add short description for each parser
* create a type for the root parser
* implement repetition parser
* Make optional, one_or_more, and zero_or_more subclasses of repetition
* improve context constructor
* improve until parsing and add benchmarks
* remove cached() pattern, cache in parser_base with specialized parsing functions for each parser
* improve json parsing performance to better match legacy parsing
* fix const auto * it for windows
* move id assignment to classes instead of using a visitor
* create named rules in the command r7b example
* use '.' for any in GBNF
* fix parens around choices in gbnf grammar
* add convenience operators to turn strings to literals
* add free-form operators for const char * to simplify defining literals
* simplify test case parser
* implement semantic actions
* remove groups in favor of actions and a scratchpad
* add built in actions for common operations
* add actions to command r7b example
* use std::default_searcher for platforms that don't have bm
* improve parser_type handling and add cast helper
* add partial result type to better control when to run actions
* fix bug in until()
* run actions on partial results by default
* use common_chat_msg for result
* add qwen3 example wip
* trash partial idea and simplify
* move action arguments to a struct
* implement aho-corasick matcher for until_parser and to build exclusion grammars
* use std::string for input, since std::string_view is incompatible with std::regex
* Refactor tests
* improve qwen3 example
* implement sax-style parsing and refactor
* fix json string in test
* rename classes to use common_chat_ prefix
* remove is_ suffix from functions
* rename from id_counter to just counter
* Final refactored tests
* Fix executable name and editorconfig-checker
* Third time's the charm...
* add trigger parser to begin lazy grammar rule generation
* working lazy grammar
* refactor json rules now that we check for reachability
* reduce pointer usage
* print out grammars in example
* rename to chat-peg-parser* and common_chat_peg_parser*
* Revert unrelated changes
* New macros for CMakeLists to enable multi-file compilations
* starting unicode support
* add unicode support to char_parser
* use unparsed args as additional sources
* Refactor tests to new harness
* Fix CMakeLists
* fix rate calculation
* add unicode tests
* fix trailing whitespace and line endings
skip-checks: true
* Helpers + rewrite qwen3 with helpers
* Fix whitespace
* extract unicode functions to separate file
* refactor parse unicode function
* fix compiler error
* improve construction of sequence/choice parsers
* be less clever
* add make_parser helper function
* expand usage of make_parser, alias common_chat_msg_peg_parser_builder to builder in source
* lower bench iterations
* add unicode support to until_parser
* add unicode support to json_string_parser
* clean up unicode tests
* reduce unicode details to match src/unicode.cpp
* simplify even further
* remove unused functions
* fix type
* reformat char class parsing
* clean up json string parser
* clean up + fix diagnostics
* reorder includes
* compact builder functions
* replace action_parser with capture_parser, rename env to semantics
* rename env to semantics
* clean up common_chat_parse_context
* move type() to below constant
* use default constructor for common_chat_peg_parser
* make all operators functions for consistency
* fix compilation errors in test-optional.cpp
* simplify result values
* rename json_string_unquoted to json_string_content
* Move helper to separate class, add separate explicit and helper classes
* Whitespace
* Change + to append()
* Reformat
* Add extra helpers, tests and Minimax example
* Add some extra optional debugging prints + real example of how to use them
* fix bug in repetitions when min_count = 0 reports failures
* dump rule in debug
* fix token accumulation and assert parsing never fails
* indent debug by depth
* use LOG_* in tests so logs sync up with test logs
* - Add selective testing
- Refactor all messaging to use LOG_ERR
- Fix lack of argument / tool name capturing
- Temporary fix for double event capture
* refactor rule() and introduce ref()
* clean up visitor
* clean up indirection in root parser w.r.t rules
* store shared ptr directly in parser classes
* replace aho-corasick automation with a simple trie
* Reset prev for qwen3 helper example variant
* refactor to use value semantics with std::variant/std::visit
* simplify trie_matcher result
* fix linting issues
* add annotations to rules
* revert test workaround
* implement serializing the parser
* remove redundant parsers
* remove tests
* gbnf generation fixes
* remove LOG_* use in tests
* update gbnf tests to test entire grammar
* clean up gbnf generation and fix a few bugs
* fix typo in test output
* remove implicit conversion rules
* improve test output
* rename trie_matcher to trie
* simplify trie to just know if a node is the end of a word
* remove common_chat_ prefix and ensure a common_peg_ prefix to all types
* rename chat-peg-parser -> peg-parser
* promote chat-peg-parser-helper to chat-peg-parser
* checkpoint
* use a static_assert to ensure we handle every branch
* inline trivial peg parser builders
* use json strings for now
* implement basic and native chat peg parser builders/extractors
* resolve refs to their rules
* remove packrat caching (for now)
* update tests
* compare parsers with incremental input
* benchmark both complete and incremental parsing
* add raw string generation from json schema
* add support for string schemas in gbnf generation
* fix qwen example to include \n
* tidy up example
* rename extractor to mapper
* rename ast_arena to ast
* place basic tests into one
* use gbnf_format_literal from json-schema-to-grammar
* integrate parser with common/chat and server
* clean up schema and serialization
* add json-schema raw string tests
* clean up json creation and remove capture parser
* trim spaces from reasoning and content
* clean up redundant rules and comments
* rename input_is_complete to is_partial to match rest of project
* simplify json rules
* remove extraneous file
* remove comment
* implement += and |= operators
* add comments to qwen3 implementation
* reorder arguments to common_chat_peg_parse
* remove commented outdated tests
* add explicit copy constructor
* fix operators and constness
* wip: update test-chat for qwen3-coder
* bring json parser closer to json-schema-to-grammar rules
* trim trailing space for most things
* fix qwen3 coder rules w.r.t. trailing spaces
* group rules
* do not trim trailing space from string args
* tweak spacing of qwen3 grammar
* update qwen3-coder tests
* qwen3-coder small fixes
* place parser in common_chat_syntax to simplify invocation
* use std::set to collect rules to keep order predictable for tests
* initialize parser to make certain platforms happy
* revert back to std::unordered_set, sort rule names at the end instead
* uncomment rest of chat tests
* define explicit default constructor
* improve arena init and server integration
* fix chat test
* add json_member()
* add a comprehensive native example
* clean up example qwen test and add response_format example to native test
* make build_peg_parser accept std::function instead of template
* change peg parser parameters into const ref
* push tool call on tool open for constructed parser
* add parsing documentation
* clean up some comments
* add json schema support to qwen3-coder
* add id initializer in tests
* remove grammar debug line from qwen3-coder
* refactor qwen3-coder to use sequence over operators
* only call common_chat_peg_parse if appropriate format
* simplify qwen3-coder space handling
* revert qwen3-coder implementation
* revert json-schema-to-grammar changes
* remove unnecessary forward declaration
* small adjustment to until_parser
* rename C/C++ files to use dashes
* codeowners : add aldehir to peg-parser and related files
---------
Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>
* vulkan: add LOG operation support for F32 and F16
Part of #14909.
* vulkan: Fix LOG operation types
* docs: Update operation support documentation for Vulkan LOG operation
* vulkan: fix log_f16 shader
* docs: restore missing LOG test cases and regenerate ops.md
* SYCL: add generic unary op implementation for multiple ops (ABS/SGN/…); unify non-contiguous access
* SYCL: update documentation and sycl.csv to reflect new unary op support
* update ops.md after syncing SYCL.csv changes
* Fix SYCL.csv merge conflict
* Update ops.md after fixing SYCL.csv conflicts
* Fix SYCL.csv tail after merge conflict and regenerate ops.md
* Fix line endings and final newline in SYCL.csv
* Remove TOPK_MOE entries from SYCL.csv as requested
* Update ops.md after removing TOPK_MOE from SYCL.csv
* Regenerated SYCL.csv and synced ops.md with upstream
* Update ops.md using create_ops_docs.py
* model: add support for extra bufs for all devices
* hexagon: add experimental ggml-hexagon backend for the Hexagon NPU
This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU.
Highlights:
- Supports Hexagon versions: v73, v75, v79, and v81
- Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5
- Supports Q4_0, Q8_0, MXFP4, and FP32 data types
- Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX
**Note:** This backend is experimental and may exhibit instability or limited performance across supported devices.
It is intended for early testing and feedback from llama.cpp/ggml developer and user community.
Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com>
* hexagon: fix format checker errors
* hexagon: update readme and cmake presets
* ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions
* hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input
* hexagon: move ADB helper scripts into scripts/snapdragon/adb
* hexagon: replace all f/printfs with GGML_LOG_...
* readme: add hexagon to the list supported backends
* hexagon: stack malmuts with quantized inputs only
* hexagon: add TODO for fixing issues in hexagon_graph_optimize
* hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC
* scripts: fix lint errors
* scripts: update qdc pytest script to make linter happy
* hexagon: add reduce sum in fp32
* hexagon: reduce number of vector stores in matmul output
* hexagon: remove the need for vdelta in reduce-multiply-x8
* hexagon: consistent use of reduce_sum_fp32 for row_sums
* hexagon: some more matmul optimizations and comments
Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models).
We've handled those cases already but at a higher overhead.
* hexagon: update cmake presets
* hexagon: add OPMASK support for run-bench.sh wrapper
* hexagon: update to use GGML_BACKEND_API
* hexagon: remove unused logic for setting tensor flags for the views
* hexagon: add asserts to set/get_tensor to make sure we handle complete tensors
Same asserts as the CPU backend.
* hexagon: use cpy_tensor slow path for non-host buffers
* hexagon: error checks in the buffer allocator
* cmake: move include(extProj) under ggml-hexagon
* hexagon: don't forget to delete the backend on free
* hexagon: set/get_tensor size assert apply only to quantized tensors
* hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now
GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way.
Ideally we need a bit more finer log levels.
* docs: typos in hexagon developer docs (libggm-...)
* hexagon: overhaul error handling in the session/device allocation
this should handle all failure paths in the session allocation.
* hexagon: update cmake presets to enable fp16 vectors
* hexagon: remove unused time_usec function
* hexagon: don't forget to release buffer contexts
* hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure)
* hexagon: remove custom can_repeat function and use ggml_can_repeat
---------
Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>
* SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators
Clean up unrelated changes from previous commit
* Chore: remove empty lines and fix indentation
* Clean up: remove leftover blank lines and fix spacing
* chore: fix trailing whitespace and ensure final newline
* Cleanup: remove redundant declarations already defined in header
* Sync docs/ops.md with updated backend operation support
* docs: update ops.md after rebase
* docs: update ops.md - Vulkan supports SSM_CONV and SSM_SCAN
* vulkan: implement SSM scan operation
Add State Space Model scan operation to the Vulkan backend.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* vulkan: implement SSM conv operation
Add State Space Model conv operation to the Vulkan backend.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
---------
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
* CPU: Add support for FLOOR,CEIL,ROUND and TRUNC unary operators
- Added the operators to unary op enum
- Implemented API functions
- Implemented forward and unary-op logic in CPU backend
- Updated ggml_get_n_tasks
- Updated operators names array and static_assert
- Updated docs and enabled automatic tests
* docs: add documentation for ggml_trunc and ggml_trunc_inplace in ggml.h
* chore: remove trailing whitespace from ggml.h
* Remove unresolved merge markers
* Apply review suggestions: cleanup formatting, enum order and leftover artifacts
* Regenerate ops.md using create_ops_docs.py
* fix/refactor OP argsort, pad
* fix count-equal op
* update SYCL OP list
* fix format issue
---------
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>
* update oneapi to 2025.2, use deep-learning-essentials to replace base-tool
* update to 2025.2 use deeplearn essi to replace base toolkit
* add missed dll
* add deep learning essentials
* add sycl-ls
---------
Co-authored-by: Zhang Jianyu <zhang.jianyu@outlook.com>
Since the prefill length is not fixed, graphs constructed for the
prefill stage cannot be reused. For this reason, ACL graph
execution is disabled by default during prefill.