Commit Graph

62 Commits

Author SHA1 Message Date
Han Yin 21e61281fa lib: expose Arm features 2025-10-28 11:39:17 -07:00
Han Yin c5058366dc lib: hide the internal implementations, only expose a facade and interfaces 2025-10-28 11:39:17 -07:00
Han Yin 57c3a9dda7 lib: replace the naive & plain SharedPreferences with DataStore implementation 2025-10-28 11:39:17 -07:00
Han Yin 130cba9aa6 lib: expose GgufMetadataReader as interface only 2025-10-28 11:39:17 -07:00
Han Yin 6a5bc94ff1 [WIP] lib: move GgufMetadata into the lib submodule 2025-10-28 11:39:17 -07:00
Han Yin 4b3f6ef8d7 misc: rename LlamaAndroid related class to InferenceEngine prefixes 2025-10-28 11:39:17 -07:00
Han Yin b59c59e5c3 core: add back OpenMP due to huge perf loss on TG128 2025-10-28 11:39:17 -07:00
Han Yin 53ac8af67a core: swap out hardcoded LlamaAndroid library loading 2025-10-28 11:39:17 -07:00
Han Yin 1b79db877d core: implement cpu_detector native lib 2025-10-28 11:39:17 -07:00
Han Yin 98c8f5e59e [WIP] llama: enable KleidiAI and disable tier 4 due to `+sve+sve2` bug caused by `ggml_add_cpu_backend_variant_impl` as explained below
```CMake
if (NOT SME_ENABLED MATCHES -1)
...
    set(PRIVATE_ARCH_FLAGS "-fno-tree-vectorize;${PRIVATE_ARCH_FLAGS}+sve+sve2")
...
```
2025-10-28 11:39:17 -07:00
Han Yin ead41ff655 [WIP] llama: disable OpenMP in ABI split since most SoCs are big.LITTLE 2025-10-28 11:39:17 -07:00
Han Yin 3884bbcb86 [WIP] llama: ABI split where five tiers are built sequentially. 2025-10-28 11:39:17 -07:00
Han Yin 75d1abe24a [WIP] llama: ABI split builds five .so artifacts.
However, all .so are performing on SVE level
2025-10-28 11:39:17 -07:00
Han Yin eab502a735 llama: migrate C/CXX flags into CMakeList 2025-10-28 11:39:17 -07:00
Han Yin a4c66c4baf nit: print current pp & tg in llama-bench 2025-10-28 11:39:17 -07:00
Han Yin d1b018e375 UI: show a Snack bar to warn user that system prompt is not always supported 2025-10-28 11:39:17 -07:00
Han Yin e1c77c6bbd LLama: add a new Initializing state; ; add two extension properties; rename LibraryLoaded state to Initialized 2025-10-28 11:39:17 -07:00
Han Yin c08d02d233 LLama: add ModelUnloadingState to engine State; add missing state checks in stub engine; fix instrumentation engine's error messages 2025-10-28 11:39:17 -07:00
Han Yin 65d4a57a8b LLama: refactor loadModel by splitting the system prompt setting into a separate method 2025-10-28 11:39:16 -07:00
Han Yin 46859c10f0 LLama: update engine state after handling the cancellation of sendUserPrompt 2025-10-28 11:39:16 -07:00
Han Yin d70b8fe323 core: swap in LLamaAndroid and mark stub engine for testing only 2025-10-28 11:39:16 -07:00
Han Yin cbe7133742 UI: introduce new dependencies, update versions & references 2025-10-28 11:39:16 -07:00
Han Yin 37f3e1c415 Feature: use local llama_context for benchmarking; support context init with custom context size 2025-10-28 11:39:16 -07:00
Han Yin 6d2279e9cd REWRITE JNI bridge; Update viewmodel 2025-10-28 11:39:16 -07:00
Han Yin e1bc87610e Perf: allocate `llama_batch` on stack with `llama_batch_init` 2025-10-28 11:39:16 -07:00
Han Yin 2b52563737 Polish: better logging & documentation 2025-10-28 11:39:16 -07:00
Han Yin ec502cfde9 Feature: implement infinite conversation via context shifting 2025-10-28 11:39:16 -07:00
Han Yin 4e515727b4 Abort on system prompt too long; Truncate user prompt if too long. 2025-10-28 11:39:16 -07:00
Han Yin 4809112ec5 Polish: adopt common naming; init modularization; 2025-10-28 11:39:16 -07:00
Han Yin 8bf2f4d412 Feature: chat template auto formatting 2025-10-28 11:39:16 -07:00
Han Yin 1b0754c0f5 Perf: optimize performance with ARM features 2025-10-28 11:39:16 -07:00
Han Yin bb5b824208 Polish: populate backend names in `benchModel` 2025-10-28 11:39:16 -07:00
Han Yin c14c11dcbd Feature: decode system and user prompt in batches 2025-10-28 11:39:16 -07:00
Han Yin 02465137ca Bug fix: null system prompt state update; Safeguard empty user prompt 2025-10-28 11:39:16 -07:00
Han Yin 7bbb53aaf8 Clang-tidy linting: make functions & global variables static 2025-10-28 11:39:16 -07:00
Han Yin f44882aeeb Enforce centralized dependency management; bump Gradle & deps versions 2025-10-28 11:39:16 -07:00
Han Yin 0ade7fb4d7 Polish binding: Remove verbose setup JNI APIs; Update state machine states. 2025-10-28 11:39:16 -07:00
Han Yin 7dc9968f82 Restructure `LLamaAndroid.kt` 2025-10-28 11:39:16 -07:00
Han Yin 44720859d6 Rewrite llama-android JNI implementation 2025-10-28 11:39:15 -07:00
Han Yin d4ab3832cf Use common sampler 2025-10-28 11:39:15 -07:00
Han Yin 1f255d4bca Tidy & clean LLamaAndroid binding 2025-10-28 11:39:15 -07:00
Georgi Gerganov 745aa5319b
llama : deprecate llama_kv_self_ API (#14030)
* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci
2025-06-06 14:11:15 +03:00
Xuan-Son Nguyen bd3f59f812
cmake : enable curl by default (#12761)
* cmake : enable curl by default

* no curl if no examples

* fix build

* fix build-linux-cross

* add windows-setup-curl

* fix

* shell

* fix path

* fix windows-latest-cmake*

* run: include_directories

* LLAMA_RUN_EXTRA_LIBS

* sycl: no llama_curl

* no test-arg-parser on windows

* clarification

* try riscv64 / arm64

* windows: include libcurl inside release binary

* add msg

* fix mac / ios / android build

* will this fix xcode?

* try clearing the cache

* add bunch of licenses

* revert clear cache

* fix xcode

* fix xcode (2)

* fix typo
2025-04-07 13:35:19 +02:00
Georgi Gerganov e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181)
* llama : refactor llama_context, llama_kv_cache, llm_build_context

ggml-ci

* graph : don't mutate the KV cache during defrag

ggml-ci

* context : reduce virtuals + remove test function

ggml-ci

* context : move interface implementation to source file + factory

ggml-ci

* graph : move KV cache build functions to llama_context impl

ggml-ci

* graph : remove model reference from build_pooling

ggml-ci

* graph : remove llama_model reference

ggml-ci

* kv_cache : provide rope factors

ggml-ci

* graph : rework inputs to use only unique_ptr, remove attn input abstraction

ggml-ci

* context : remove llama_context_i abstraction

ggml-ci

* context : clean-up

ggml-ci

* graph : clean-up

ggml-ci

* llama : remove redundant keywords (struct, enum)

ggml-ci

* model : adapt gemma3

ggml-ci

* graph : restore same attention ops as on master

ggml-ci

* llama : remove TODO + fix indent

ggml-ci
2025-03-13 12:35:44 +02:00
Han Yin 57b6abf85a
android : fix KV cache log message condition (#12212) 2025-03-06 08:22:49 +02:00
Georgi Gerganov 68ff663a04
repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00
codezjx 3edfa7d375
llama.android: add field formatChat to control whether to parse special tokens when send message (#11270) 2025-01-17 14:57:56 +02:00
Georgi Gerganov afa8a9ec9b
llama : add `llama_vocab`, functions -> methods, naming (#11110)
* llama : functions -> methods (#11110)

* llama : add struct llama_vocab to the API (#11156)

ggml-ci

* hparams : move vocab params to llama_vocab (#11159)

ggml-ci

* vocab : more pimpl (#11165)

ggml-ci

* vocab : minor tokenization optimizations (#11160)

ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* lora : update API names (#11167)

ggml-ci

* llama : update API names to use correct prefix (#11174)

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]

* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174)

ggml-ci

* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174)

ggml-ci

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00
ag2s20150909 c250ecb315
android : fix llama_batch free (#11014) 2024-12-30 14:35:13 +02:00
Diego Devesa 9177484f58
ggml : fix arm build (#10890)
* ggml: GGML_NATIVE uses -mcpu=native on ARM

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* ggml: Show detected features with GGML_NATIVE

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* remove msvc support, add GGML_CPU_ARM_ARCH option

* disable llamafile in android example

* march -> mcpu, skip adding feature macros

ggml-ci

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
Co-authored-by: Adrien Gallouët <angt@huggingface.co>
2024-12-18 23:21:42 +01:00