Commit Graph

8 Commits

Author SHA1 Message Date
shouyud cf3a65fb73 Revert "chore: reformat code with clang-formatter to pass cli test"
This reverts commit 952877ec24.
2025-12-16 14:28:34 -05:00
shouyud 952877ec24 chore: reformat code with clang-formatter to pass cli test 2025-12-16 08:41:54 -05:00
shouyud 05693357c8 refactor: cleanup commented unused code 2025-12-16 08:31:16 -05:00
shouyud fc2289dc96 Feat: optiized unaligned sigmoid_f32 2025-12-12 11:58:45 -05:00
M. Mediouni fcb013847c
ggml-hexagon: Initial Hexagon v68/v69 support (#17394)
* ggml-hexagon: fix build error with GCC

Add stdexcept include to fix GCC build errors

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

* ggml-hexagon: check VTCM acquire failures

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

* ggml-hexagon: disable destination bypass on older than v73

v68 errors out if having bypass enabled when the VTCM is the destination.

At least on v68 this made things actually work... not a proper fix though, so to look at later...

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

* ggml-hexagon: add initial v68/v69 support

v68 is the Hexagon revision notably used on the Snapdragon 8cx
Gen 3 and the QCM6490.

Also add support for v69.

8MB isn't a supported page size, so relax asked for page size constraint
for HAP_compute_res_attr_set_vtcm_param_v2 to optimal.

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>

---------

Signed-off-by: Mohamed Mediouni <mohamed@unpredictable.fr>
2025-11-23 16:54:49 -08:00
nullname d5bc1ad110
ggml-hexagon: add `hex_supported_buffer` for better buffer supported check (#17212)
* hexagon: add buffer support checks for hexagon sessions

* refactor: simplify buffer support checks in hexagon operations

* hexagon: update buffer support checks to use tensor structure

* refactor: streamline buffer initialization for DSP queue in hexagon operations

* refactor: simplify buffer initialization in DSP queue for hexagon operations

* refactor: optimize hex_supported_buffer function by fold expression

* wip

* refactor: simplify dspqueue_buffers_init function and its usage in hexagon operations

* fix: improve nan handling at hvx_vec_fast_sigmoid_fp32_guard

* refactor: optimize hvx_vec_inverse_fp32_guard for better nan handling

* refactor: update hvx_vec_fast_sigmoid_fp32_guard to use adjusted exponent limits

* refactor: modify hvx_vec_fast_sigmoid_fp32_guard to accept parameters for improved flexibility

* refactor: update hvx_vec_exp_fp32_guard to accept max_exp and inf parameters to save some instructions

* refactor: move hvx_vec_inverse_fp32_guard implementation to hvx-inverse.c for better perf
2025-11-23 14:26:36 -08:00
nullname 21d31e0810
ggml-hexagon: fix swiglu failure at `test-backend-ops` (#17344)
* refactor: use hvx_vec_exp_fp32_guard_inf for overflow handling in hvx_exp_f32

* feat: add fast sigmoid function with overflow guard for fp32

* refactor: replace hvx_vec_inverse_fp32 with hvx_vec_inverse_fp32_guard_inf for improved overflow handling

* feat: enhance hvx_add_scalar_f32 with overflow handling using infinity guard

* wip

* add HVX_Vector_Alias

wip

* wip

* fix: improve handling of src1 tensor in glu_swiglu_fp32_per_thread function

* fix nc

* wip

* wip

* handle nan at inverse

* wip

* fix neg

* wip

* rename

* fix hvx_vec_inverse_fp32_guard_inf to handle infinity and NaN cases correctly

* wip

* fix hvx_vec_inverse_fp32_guard_inf to handle NaN cases correctly

* wip

* wip

* wip

* fix output sign
2025-11-20 15:45:05 -08:00
Max Krasnyansky 63d2fc46e1
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
* model: add support for extra bufs for all devices

* hexagon: add experimental ggml-hexagon backend for the Hexagon NPU

This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU.

Highlights:
- Supports Hexagon versions: v73, v75, v79, and v81
- Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5
- Supports Q4_0, Q8_0, MXFP4, and FP32 data types
- Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX

**Note:** This backend is experimental and may exhibit instability or limited performance across supported devices.
It is intended for early testing and feedback from llama.cpp/ggml developer and user community.

Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com>

* hexagon: fix format checker errors

* hexagon: update readme and cmake presets

* ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions

* hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input

* hexagon: move ADB helper scripts into scripts/snapdragon/adb

* hexagon: replace all f/printfs with GGML_LOG_...

* readme: add hexagon to the list supported backends

* hexagon: stack malmuts with quantized inputs only

* hexagon: add TODO for fixing issues in hexagon_graph_optimize

* hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC

* scripts: fix lint errors

* scripts: update qdc pytest script to make linter happy

* hexagon: add reduce sum in fp32

* hexagon: reduce number of vector stores in matmul output

* hexagon: remove the need for vdelta in reduce-multiply-x8

* hexagon: consistent use of reduce_sum_fp32 for row_sums

* hexagon: some more matmul optimizations and comments

Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models).
We've handled those cases already but at a higher overhead.

* hexagon: update cmake presets

* hexagon: add OPMASK support for run-bench.sh wrapper

* hexagon: update to use GGML_BACKEND_API

* hexagon: remove unused logic for setting tensor flags for the views

* hexagon: add asserts to set/get_tensor to make sure we handle complete tensors

Same asserts as the CPU backend.

* hexagon: use cpy_tensor slow path for non-host buffers

* hexagon: error checks in the buffer allocator

* cmake: move include(extProj) under ggml-hexagon

* hexagon: don't forget to delete the backend on free

* hexagon: set/get_tensor size assert apply only to quantized tensors

* hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now

GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way.
Ideally we need a bit more finer log levels.

* docs: typos in hexagon developer docs (libggm-...)

* hexagon: overhaul error handling in the session/device allocation

this should handle all failure paths in the session allocation.

* hexagon: update cmake presets to enable fp16 vectors

* hexagon: remove unused time_usec function

* hexagon: don't forget to release buffer contexts

* hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure)

* hexagon: remove custom can_repeat function and use ggml_can_repeat

---------

Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>
2025-10-22 13:47:09 -07:00