Han Yin
4ff924b273
lib: optimize engine loader; always perform a fresh detection when cache is null
2025-10-28 11:39:17 -07:00
Han Yin
e6413dd05d
UI: support `NONE` Llama Tier in general settings
2025-10-28 11:39:17 -07:00
Han Yin
1f41ae2315
lib: refactored InferenceEngineLoader; added a `NONE` Llama Tier
2025-10-28 11:39:17 -07:00
Han Yin
21e61281fa
lib: expose Arm features
2025-10-28 11:39:17 -07:00
Han Yin
c5058366dc
lib: hide the internal implementations, only expose a facade and interfaces
2025-10-28 11:39:17 -07:00
Han Yin
57c3a9dda7
lib: replace the naive & plain SharedPreferences with DataStore implementation
2025-10-28 11:39:17 -07:00
Han Yin
130cba9aa6
lib: expose GgufMetadataReader as interface only
2025-10-28 11:39:17 -07:00
Han Yin
6a5bc94ff1
[WIP] lib: move GgufMetadata into the lib submodule
2025-10-28 11:39:17 -07:00
Han Yin
4b3f6ef8d7
misc: rename LlamaAndroid related class to InferenceEngine prefixes
2025-10-28 11:39:17 -07:00
Han Yin
b59c59e5c3
core: add back OpenMP due to huge perf loss on TG128
2025-10-28 11:39:17 -07:00
Han Yin
53ac8af67a
core: swap out hardcoded LlamaAndroid library loading
2025-10-28 11:39:17 -07:00
Han Yin
1b79db877d
core: implement cpu_detector native lib
2025-10-28 11:39:17 -07:00
Han Yin
98c8f5e59e
[WIP] llama: enable KleidiAI and disable tier 4 due to `+sve+sve2` bug caused by `ggml_add_cpu_backend_variant_impl` as explained below
...
```CMake
if (NOT SME_ENABLED MATCHES -1)
...
set(PRIVATE_ARCH_FLAGS "-fno-tree-vectorize;${PRIVATE_ARCH_FLAGS}+sve+sve2")
...
```
2025-10-28 11:39:17 -07:00
Han Yin
ead41ff655
[WIP] llama: disable OpenMP in ABI split since most SoCs are big.LITTLE
2025-10-28 11:39:17 -07:00
Han Yin
3884bbcb86
[WIP] llama: ABI split where five tiers are built sequentially.
2025-10-28 11:39:17 -07:00
Han Yin
75d1abe24a
[WIP] llama: ABI split builds five .so artifacts.
...
However, all .so are performing on SVE level
2025-10-28 11:39:17 -07:00
Han Yin
eab502a735
llama: migrate C/CXX flags into CMakeList
2025-10-28 11:39:17 -07:00
Han Yin
a4c66c4baf
nit: print current pp & tg in llama-bench
2025-10-28 11:39:17 -07:00
Han Yin
d1b018e375
UI: show a Snack bar to warn user that system prompt is not always supported
2025-10-28 11:39:17 -07:00
Han Yin
e1c77c6bbd
LLama: add a new Initializing state; ; add two extension properties; rename LibraryLoaded state to Initialized
2025-10-28 11:39:17 -07:00
Han Yin
c08d02d233
LLama: add ModelUnloadingState to engine State; add missing state checks in stub engine; fix instrumentation engine's error messages
2025-10-28 11:39:17 -07:00
Han Yin
65d4a57a8b
LLama: refactor loadModel by splitting the system prompt setting into a separate method
2025-10-28 11:39:16 -07:00
Han Yin
46859c10f0
LLama: update engine state after handling the cancellation of sendUserPrompt
2025-10-28 11:39:16 -07:00
Han Yin
d70b8fe323
core: swap in LLamaAndroid and mark stub engine for testing only
2025-10-28 11:39:16 -07:00
Han Yin
cbe7133742
UI: introduce new dependencies, update versions & references
2025-10-28 11:39:16 -07:00
Han Yin
37f3e1c415
Feature: use local llama_context for benchmarking; support context init with custom context size
2025-10-28 11:39:16 -07:00
Han Yin
6d2279e9cd
REWRITE JNI bridge; Update viewmodel
2025-10-28 11:39:16 -07:00
Han Yin
e1bc87610e
Perf: allocate `llama_batch` on stack with `llama_batch_init`
2025-10-28 11:39:16 -07:00
Han Yin
2b52563737
Polish: better logging & documentation
2025-10-28 11:39:16 -07:00
Han Yin
ec502cfde9
Feature: implement infinite conversation via context shifting
2025-10-28 11:39:16 -07:00
Han Yin
4e515727b4
Abort on system prompt too long; Truncate user prompt if too long.
2025-10-28 11:39:16 -07:00
Han Yin
4809112ec5
Polish: adopt common naming; init modularization;
2025-10-28 11:39:16 -07:00
Han Yin
8bf2f4d412
Feature: chat template auto formatting
2025-10-28 11:39:16 -07:00
Han Yin
1b0754c0f5
Perf: optimize performance with ARM features
2025-10-28 11:39:16 -07:00
Han Yin
bb5b824208
Polish: populate backend names in `benchModel`
2025-10-28 11:39:16 -07:00
Han Yin
c14c11dcbd
Feature: decode system and user prompt in batches
2025-10-28 11:39:16 -07:00
Han Yin
02465137ca
Bug fix: null system prompt state update; Safeguard empty user prompt
2025-10-28 11:39:16 -07:00
Han Yin
7bbb53aaf8
Clang-tidy linting: make functions & global variables static
2025-10-28 11:39:16 -07:00
Han Yin
f44882aeeb
Enforce centralized dependency management; bump Gradle & deps versions
2025-10-28 11:39:16 -07:00
Han Yin
0ade7fb4d7
Polish binding: Remove verbose setup JNI APIs; Update state machine states.
2025-10-28 11:39:16 -07:00
Han Yin
7dc9968f82
Restructure `LLamaAndroid.kt`
2025-10-28 11:39:16 -07:00
Han Yin
44720859d6
Rewrite llama-android JNI implementation
2025-10-28 11:39:15 -07:00
Han Yin
d4ab3832cf
Use common sampler
2025-10-28 11:39:15 -07:00
Han Yin
1f255d4bca
Tidy & clean LLamaAndroid binding
2025-10-28 11:39:15 -07:00
Georgi Gerganov
745aa5319b
llama : deprecate llama_kv_self_ API ( #14030 )
...
* llama : deprecate llama_kv_self_ API
ggml-ci
* llama : allow llama_memory_(nullptr)
ggml-ci
* memory : add flag for optional data clear in llama_memory_clear
ggml-ci
2025-06-06 14:11:15 +03:00
Xuan-Son Nguyen
bd3f59f812
cmake : enable curl by default ( #12761 )
...
* cmake : enable curl by default
* no curl if no examples
* fix build
* fix build-linux-cross
* add windows-setup-curl
* fix
* shell
* fix path
* fix windows-latest-cmake*
* run: include_directories
* LLAMA_RUN_EXTRA_LIBS
* sycl: no llama_curl
* no test-arg-parser on windows
* clarification
* try riscv64 / arm64
* windows: include libcurl inside release binary
* add msg
* fix mac / ios / android build
* will this fix xcode?
* try clearing the cache
* add bunch of licenses
* revert clear cache
* fix xcode
* fix xcode (2)
* fix typo
2025-04-07 13:35:19 +02:00
Georgi Gerganov
e0dbec0bc6
llama : refactor llama_context, llama_kv_cache, llm_build_context ( #12181 )
...
* llama : refactor llama_context, llama_kv_cache, llm_build_context
ggml-ci
* graph : don't mutate the KV cache during defrag
ggml-ci
* context : reduce virtuals + remove test function
ggml-ci
* context : move interface implementation to source file + factory
ggml-ci
* graph : move KV cache build functions to llama_context impl
ggml-ci
* graph : remove model reference from build_pooling
ggml-ci
* graph : remove llama_model reference
ggml-ci
* kv_cache : provide rope factors
ggml-ci
* graph : rework inputs to use only unique_ptr, remove attn input abstraction
ggml-ci
* context : remove llama_context_i abstraction
ggml-ci
* context : clean-up
ggml-ci
* graph : clean-up
ggml-ci
* llama : remove redundant keywords (struct, enum)
ggml-ci
* model : adapt gemma3
ggml-ci
* graph : restore same attention ops as on master
ggml-ci
* llama : remove TODO + fix indent
ggml-ci
2025-03-13 12:35:44 +02:00
Han Yin
57b6abf85a
android : fix KV cache log message condition ( #12212 )
2025-03-06 08:22:49 +02:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
codezjx
3edfa7d375
llama.android: add field formatChat to control whether to parse special tokens when send message ( #11270 )
2025-01-17 14:57:56 +02:00