Commit Graph

6871 Commits

Author SHA1 Message Date
Han Yin 2a41c0e354 vm: replace token metrics stubs with actual implementation 2025-10-28 11:39:16 -07:00
Han Yin e47e3b77ee UI: locks user in alert dialog when model is unloading 2025-10-28 11:39:16 -07:00
Han Yin 6b341b0fbe bugfix: handle user quitting on model loading 2025-10-28 11:39:16 -07:00
Han Yin e8b84c6ebf UI: code polish 2025-10-28 11:39:16 -07:00
Han Yin fddf060d92 data: code polish 2025-10-28 11:39:16 -07:00
Han Yin 3b499ac7e4 UI: polish conversation screen 2025-10-28 11:39:16 -07:00
Han Yin 64ebdc67a6 UI: update app name to be more Arm 2025-10-28 11:39:16 -07:00
Han Yin 55681847e9 UI: rename `ModeSelection` to `ModelLoading` for better clarity 2025-10-28 11:39:16 -07:00
Han Yin 75c986afc5 bugfix: properly handle user's quitting conversation screen while tokens in generation 2025-10-28 11:39:16 -07:00
Han Yin 4848bf93d0 data: introduce repo for System Prompt; flow data from Room to VM 2025-10-28 11:39:16 -07:00
Han Yin 5596d5203b DB: setup Room database 2025-10-28 11:39:16 -07:00
Han Yin 4046cd16fd Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject 2025-10-28 11:39:16 -07:00
Han Yin 5868eaa66b UI: polish system prompt setup UI 2025-10-28 11:39:16 -07:00
Han Yin a7ee3d305f UI: split a nested parent settings screen into separate child settings screens 2025-10-28 11:39:16 -07:00
Han Yin 65c09b2b32 UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark 2025-10-28 11:39:16 -07:00
Han Yin 648b97818e UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark 2025-10-28 11:39:16 -07:00
Han Yin a7ae8b7ce0 [WIP] DI: implement simple local vm factory provider 2025-10-28 11:39:16 -07:00
Han Yin ca2b7772ce UI: add a new MainActivity; update manifest 2025-10-28 11:39:16 -07:00
Han Yin 7e5c80cee9 UI: implement core flow's screens 2025-10-28 11:39:16 -07:00
Han Yin 5ad65919e9 util: implement user preferences utility 2025-10-28 11:39:16 -07:00
Han Yin 46bd638c5f util: implement performance monitor; wrap it with a viewmodel 2025-10-28 11:39:16 -07:00
Han Yin 4dd755e25b UI: implement basic UI components 2025-10-28 11:39:16 -07:00
Han Yin 32608fb225 UI: app navigation 2025-10-28 11:39:16 -07:00
Han Yin 3f913ce440 LLM: stub a local inference engine for faster iteration 2025-10-28 11:39:16 -07:00
Han Yin 3787fbddb0 data: define data models for LLM and system prompts 2025-10-28 11:39:16 -07:00
Han Yin 697d778db7 UI: define theme, color palette, typography and shape 2025-10-28 11:39:16 -07:00
Han Yin cbe7133742 UI: introduce new dependencies, update versions & references 2025-10-28 11:39:16 -07:00
Han Yin 44a522dbc8 UI: move existing UI src files into `legacy` package 2025-10-28 11:39:16 -07:00
Han Yin 37f3e1c415 Feature: use local llama_context for benchmarking; support context init with custom context size 2025-10-28 11:39:16 -07:00
Han Yin 6d2279e9cd REWRITE JNI bridge; Update viewmodel 2025-10-28 11:39:16 -07:00
Han Yin e1bc87610e Perf: allocate `llama_batch` on stack with `llama_batch_init` 2025-10-28 11:39:16 -07:00
Han Yin 2b52563737 Polish: better logging & documentation 2025-10-28 11:39:16 -07:00
Han Yin ec502cfde9 Feature: implement infinite conversation via context shifting 2025-10-28 11:39:16 -07:00
Han Yin 4e515727b4 Abort on system prompt too long; Truncate user prompt if too long. 2025-10-28 11:39:16 -07:00
Han Yin 4809112ec5 Polish: adopt common naming; init modularization; 2025-10-28 11:39:16 -07:00
Han Yin 8bf2f4d412 Feature: chat template auto formatting 2025-10-28 11:39:16 -07:00
Han Yin 1b0754c0f5 Perf: optimize performance with ARM features 2025-10-28 11:39:16 -07:00
Han Yin bb5b824208 Polish: populate backend names in `benchModel` 2025-10-28 11:39:16 -07:00
Han Yin c14c11dcbd Feature: decode system and user prompt in batches 2025-10-28 11:39:16 -07:00
Han Yin 02465137ca Bug fix: null system prompt state update; Safeguard empty user prompt 2025-10-28 11:39:16 -07:00
Han Yin 7bbb53aaf8 Clang-tidy linting: make functions & global variables static 2025-10-28 11:39:16 -07:00
Han Yin f44882aeeb Enforce centralized dependency management; bump Gradle & deps versions 2025-10-28 11:39:16 -07:00
Han Yin 0ade7fb4d7 Polish binding: Remove verbose setup JNI APIs; Update state machine states. 2025-10-28 11:39:16 -07:00
Han Yin 7dc9968f82 Restructure `LLamaAndroid.kt` 2025-10-28 11:39:16 -07:00
Han Yin 44720859d6 Rewrite llama-android JNI implementation 2025-10-28 11:39:15 -07:00
Han Yin d4ab3832cf Use common sampler 2025-10-28 11:39:15 -07:00
Han Yin 1f255d4bca Tidy & clean LLamaAndroid binding 2025-10-28 11:39:15 -07:00
matteo 8cf6b42d46
server : send partial stop string when <EOG> is reached (#15007) 2025-10-23 12:32:24 +03:00
Matthew Michel 9de9672adb
sycl: use async memory allocation to fix crashes during graph recording (#16644)
* sycl: use async memory allocation to fix graph recording failures

GGML_SYCL_DISABLE_GRAPHS=0 causes crashes because:
  - Host waits are currently unsupported in graph recording mode.
  - SYCL malloc / free calls are unsupported in graph recording mode.

The following changes are made to fix SYCL graph functionality:
  - When graphs are enabled, use the SYCL async memory extension for temp
    buffers which is supported with SYCL graphs.
  - For compiler versions that do not support this extension, skip
    graphs with the affected op.
  - Switch from USM shared to device memory as the async extension
    currently just supports device allocations.

* Address reviewer feedback

* Use global async variable to decide path in sycl_ext_[malloc_device|free]
2025-10-23 09:05:15 +08:00
Max Krasnyansky 63d2fc46e1
Add experimental ggml-hexagon backend for the Hexagon NPU (#16547)
* model: add support for extra bufs for all devices

* hexagon: add experimental ggml-hexagon backend for the Hexagon NPU

This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU.

Highlights:
- Supports Hexagon versions: v73, v75, v79, and v81
- Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5
- Supports Q4_0, Q8_0, MXFP4, and FP32 data types
- Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX

**Note:** This backend is experimental and may exhibit instability or limited performance across supported devices.
It is intended for early testing and feedback from llama.cpp/ggml developer and user community.

Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com>

* hexagon: fix format checker errors

* hexagon: update readme and cmake presets

* ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions

* hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input

* hexagon: move ADB helper scripts into scripts/snapdragon/adb

* hexagon: replace all f/printfs with GGML_LOG_...

* readme: add hexagon to the list supported backends

* hexagon: stack malmuts with quantized inputs only

* hexagon: add TODO for fixing issues in hexagon_graph_optimize

* hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC

* scripts: fix lint errors

* scripts: update qdc pytest script to make linter happy

* hexagon: add reduce sum in fp32

* hexagon: reduce number of vector stores in matmul output

* hexagon: remove the need for vdelta in reduce-multiply-x8

* hexagon: consistent use of reduce_sum_fp32 for row_sums

* hexagon: some more matmul optimizations and comments

Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models).
We've handled those cases already but at a higher overhead.

* hexagon: update cmake presets

* hexagon: add OPMASK support for run-bench.sh wrapper

* hexagon: update to use GGML_BACKEND_API

* hexagon: remove unused logic for setting tensor flags for the views

* hexagon: add asserts to set/get_tensor to make sure we handle complete tensors

Same asserts as the CPU backend.

* hexagon: use cpy_tensor slow path for non-host buffers

* hexagon: error checks in the buffer allocator

* cmake: move include(extProj) under ggml-hexagon

* hexagon: don't forget to delete the backend on free

* hexagon: set/get_tensor size assert apply only to quantized tensors

* hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now

GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way.
Ideally we need a bit more finer log levels.

* docs: typos in hexagon developer docs (libggm-...)

* hexagon: overhaul error handling in the session/device allocation

this should handle all failure paths in the session allocation.

* hexagon: update cmake presets to enable fp16 vectors

* hexagon: remove unused time_usec function

* hexagon: don't forget to release buffer contexts

* hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure)

* hexagon: remove custom can_repeat function and use ggml_can_repeat

---------

Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com>
Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>
2025-10-22 13:47:09 -07:00