llama.cpp

Commit Graph

Author	SHA1	Message	Date
Han Yin	63fc56d603	UI: centralize the AppScaffold and modularize its configs	2025-10-28 11:39:17 -07:00
Han Yin	72e97b93c5	feature: check for available space before copying local model	2025-10-28 11:39:16 -07:00
Han Yin	65d4a57a8b	LLama: refactor loadModel by splitting the system prompt setting into a separate method	2025-10-28 11:39:16 -07:00
Han Yin	9f77155535	VM: handle the cancellation of ongoing token generation	2025-10-28 11:39:16 -07:00
Han Yin	46859c10f0	LLama: update engine state after handling the cancellation of sendUserPrompt	2025-10-28 11:39:16 -07:00
Han Yin	06448a60a5	UI: update UI ongoing model import's cancellation	2025-10-28 11:39:16 -07:00
Han Yin	9ba74a9d3d	data: allow canceling the ongoing model import	2025-10-28 11:39:16 -07:00
Han Yin	d70b8fe323	core: swap in LLamaAndroid and mark stub engine for testing only	2025-10-28 11:39:16 -07:00
Han Yin	c2426a42e5	UI: unify Model Card components	2025-10-28 11:39:16 -07:00
Han Yin	434933f5b3	UI: show model card in Conversation screen	2025-10-28 11:39:16 -07:00
Han Yin	9769467723	UI: show model card in Model Loading screen	2025-10-28 11:39:16 -07:00
Han Yin	9cfa74f754	core: break down InferenceManager due to Interface Segregation Principle	2025-10-28 11:39:16 -07:00
Han Yin	286ed05f13	vm: merge SystemPromptViewModel into ModelLoadingViewModel	2025-10-28 11:39:16 -07:00
Han Yin	23d411d86e	vm: split mono MainViewModel into separate individual ViewModels	2025-10-28 11:39:16 -07:00
Han Yin	32d778bb8e	core: extract conversation and benchmark logics into InferenceManager; add logs and missing state updates in stub InferenceEngine	2025-10-28 11:39:16 -07:00
Han Yin	51b120f464	data: pass through getModelById from ModelDao into ModelRepository	2025-10-28 11:39:16 -07:00
Han Yin	59f5caa699	Util: split FileUtils from ModelUtils; extract copy methods into FileUtils	2025-10-28 11:39:16 -07:00
Han Yin	4913ad0dae	nit: tidy SystemPromptViewModel	2025-10-28 11:39:16 -07:00
Han Yin	2614f91226	UI: replace model selection screen's data stubbing; add empty view	2025-10-28 11:39:16 -07:00
Han Yin	6b48f7473f	UI: extract a shared ModelCard component	2025-10-28 11:39:16 -07:00
Han Yin	0d41e75ca5	UI: add a confirmation step when user picks a file; refactor model import overlay into AlertDialog	2025-10-28 11:39:16 -07:00
Han Yin	1bebd1bb07	util: extract file size formatting into ModelUtils	2025-10-28 11:39:16 -07:00
Han Yin	561fe0222f	UI: handle back navigation when user is in multi-selection mode	2025-10-28 11:39:16 -07:00
Han Yin	2d6b8856f6	UI: implement multiple models deletion; update Models Management screen	2025-10-28 11:39:16 -07:00
Han Yin	025e3d2417	UI: enrich ModelManagementState; extract filename to show correct importing UI	2025-10-28 11:39:16 -07:00
Han Yin	adfbfe3ffb	data: add a util file for extracting file name & size and model metadata	2025-10-28 11:39:16 -07:00
Han Yin	290a6bfebe	bugfix: use List instead of Collection for ModelDao's deletion	2025-10-28 11:39:16 -07:00
Han Yin	5de0b5d6d0	data: import local model with file picker	2025-10-28 11:39:16 -07:00
Han Yin	a3ebdac58f	UI: polish sort order menu	2025-10-28 11:39:16 -07:00
Han Yin	760d66c97d	UI: replace Models Management screen's stubbing with instrumentation	2025-10-28 11:39:16 -07:00
Han Yin	bc93c384a7	data: introduce Model entity and DAO; update DI module	2025-10-28 11:39:16 -07:00
Han Yin	f5e2edda87	data: [WIP] prepare for ModelRepository refactor & impl	2025-10-28 11:39:16 -07:00
Han Yin	b6cc8f0c01	DI: abstract the protocol of SystemPromptRepository; update AppModule	2025-10-28 11:39:16 -07:00
Han Yin	eebc05b559	UI: polish UI for ModelsManagementScreen; inject ModelsManagementVieModel	2025-10-28 11:39:16 -07:00
Han Yin	6e82bb37d3	Feature: Introduce ModelRepository and ModelsManagementViewModel; update AppModule	2025-10-28 11:39:16 -07:00
Han Yin	aedf442632	DI: Optimize AppModule	2025-10-28 11:39:16 -07:00
Han Yin	d60bba9b8f	UI: navigation with more natural animated transitions	2025-10-28 11:39:16 -07:00
Han Yin	511df35704	bugfix: wait for model to load before navigating to benchmark screen; use NavigationActions instead of raw navController	2025-10-28 11:39:16 -07:00
Han Yin	ea11ee3c94	UI: optimize AppContent's composing	2025-10-28 11:39:16 -07:00
Han Yin	0afd087f35	DI: replace manual DI with Hilt DI	2025-10-28 11:39:16 -07:00
Han Yin	a1f6e7e476	DI: make viewmodels Hilt injectable	2025-10-28 11:39:16 -07:00
Han Yin	564b095427	DI: make app Hilt injectable	2025-10-28 11:39:16 -07:00
Han Yin	65741a7e64	DI: introduce Hilt plugin + processor + lib dependencies	2025-10-28 11:39:16 -07:00
Han Yin	af0d68d611	nit: combine temperatureMetrics and useFahrenheit	2025-10-28 11:39:16 -07:00
Han Yin	5e4972e93e	UI: refactor top app bars	2025-10-28 11:39:16 -07:00
Han Yin	2a41c0e354	vm: replace token metrics stubs with actual implementation	2025-10-28 11:39:16 -07:00
Han Yin	e47e3b77ee	UI: locks user in alert dialog when model is unloading	2025-10-28 11:39:16 -07:00
Han Yin	6b341b0fbe	bugfix: handle user quitting on model loading	2025-10-28 11:39:16 -07:00
Han Yin	e8b84c6ebf	UI: code polish	2025-10-28 11:39:16 -07:00
Han Yin	fddf060d92	data: code polish	2025-10-28 11:39:16 -07:00
Han Yin	3b499ac7e4	UI: polish conversation screen	2025-10-28 11:39:16 -07:00
Han Yin	64ebdc67a6	UI: update app name to be more Arm	2025-10-28 11:39:16 -07:00
Han Yin	55681847e9	UI: rename `ModeSelection` to `ModelLoading` for better clarity	2025-10-28 11:39:16 -07:00
Han Yin	75c986afc5	bugfix: properly handle user's quitting conversation screen while tokens in generation	2025-10-28 11:39:16 -07:00
Han Yin	4848bf93d0	data: introduce repo for System Prompt; flow data from Room to VM	2025-10-28 11:39:16 -07:00
Han Yin	5596d5203b	DB: setup Room database	2025-10-28 11:39:16 -07:00
Han Yin	4046cd16fd	Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject	2025-10-28 11:39:16 -07:00
Han Yin	5868eaa66b	UI: polish system prompt setup UI	2025-10-28 11:39:16 -07:00
Han Yin	a7ee3d305f	UI: split a nested parent settings screen into separate child settings screens	2025-10-28 11:39:16 -07:00
Han Yin	65c09b2b32	UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark	2025-10-28 11:39:16 -07:00
Han Yin	648b97818e	UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark	2025-10-28 11:39:16 -07:00
Han Yin	a7ae8b7ce0	[WIP] DI: implement simple local vm factory provider	2025-10-28 11:39:16 -07:00
Han Yin	ca2b7772ce	UI: add a new MainActivity; update manifest	2025-10-28 11:39:16 -07:00
Han Yin	7e5c80cee9	UI: implement core flow's screens	2025-10-28 11:39:16 -07:00
Han Yin	5ad65919e9	util: implement user preferences utility	2025-10-28 11:39:16 -07:00
Han Yin	46bd638c5f	util: implement performance monitor; wrap it with a viewmodel	2025-10-28 11:39:16 -07:00
Han Yin	4dd755e25b	UI: implement basic UI components	2025-10-28 11:39:16 -07:00
Han Yin	32608fb225	UI: app navigation	2025-10-28 11:39:16 -07:00
Han Yin	3f913ce440	LLM: stub a local inference engine for faster iteration	2025-10-28 11:39:16 -07:00
Han Yin	3787fbddb0	data: define data models for LLM and system prompts	2025-10-28 11:39:16 -07:00
Han Yin	697d778db7	UI: define theme, color palette, typography and shape	2025-10-28 11:39:16 -07:00
Han Yin	cbe7133742	UI: introduce new dependencies, update versions & references	2025-10-28 11:39:16 -07:00
Han Yin	44a522dbc8	UI: move existing UI src files into `legacy` package	2025-10-28 11:39:16 -07:00
Han Yin	37f3e1c415	Feature: use local llama_context for benchmarking; support context init with custom context size	2025-10-28 11:39:16 -07:00
Han Yin	6d2279e9cd	REWRITE JNI bridge; Update viewmodel	2025-10-28 11:39:16 -07:00
Han Yin	e1bc87610e	Perf: allocate `llama_batch` on stack with `llama_batch_init`	2025-10-28 11:39:16 -07:00
Han Yin	2b52563737	Polish: better logging & documentation	2025-10-28 11:39:16 -07:00
Han Yin	ec502cfde9	Feature: implement infinite conversation via context shifting	2025-10-28 11:39:16 -07:00
Han Yin	4e515727b4	Abort on system prompt too long; Truncate user prompt if too long.	2025-10-28 11:39:16 -07:00
Han Yin	4809112ec5	Polish: adopt common naming; init modularization;	2025-10-28 11:39:16 -07:00
Han Yin	8bf2f4d412	Feature: chat template auto formatting	2025-10-28 11:39:16 -07:00
Han Yin	1b0754c0f5	Perf: optimize performance with ARM features	2025-10-28 11:39:16 -07:00
Han Yin	bb5b824208	Polish: populate backend names in `benchModel`	2025-10-28 11:39:16 -07:00
Han Yin	c14c11dcbd	Feature: decode system and user prompt in batches	2025-10-28 11:39:16 -07:00
Han Yin	02465137ca	Bug fix: null system prompt state update; Safeguard empty user prompt	2025-10-28 11:39:16 -07:00
Han Yin	7bbb53aaf8	Clang-tidy linting: make functions & global variables static	2025-10-28 11:39:16 -07:00
Han Yin	f44882aeeb	Enforce centralized dependency management; bump Gradle & deps versions	2025-10-28 11:39:16 -07:00
Han Yin	0ade7fb4d7	Polish binding: Remove verbose setup JNI APIs; Update state machine states.	2025-10-28 11:39:16 -07:00
Han Yin	7dc9968f82	Restructure `LLamaAndroid.kt`	2025-10-28 11:39:16 -07:00
Han Yin	44720859d6	Rewrite llama-android JNI implementation	2025-10-28 11:39:15 -07:00
Han Yin	d4ab3832cf	Use common sampler	2025-10-28 11:39:15 -07:00
Han Yin	1f255d4bca	Tidy & clean LLamaAndroid binding	2025-10-28 11:39:15 -07:00
Daniel Bevenius	56b4795842	model-conversion : add support for SentenceTransformers (#16387 ) * model-conversion : add support for SentenceTransformers This commit adds support for models that use SentenceTransformer layers. The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers. Usage: Convert the model that also includes the SentenceTransformer layers: ```console (venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M" (venv) make embedding-convert-model ``` Verify the produced embeddings from the converted model against the original model embeddings: ```console (venv) make embedding-verify-logits-st ``` The original model can be run using SentenceTransformer: ```console (venv) make embedding-run-original-model-st ``` Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization: ```console (venv) make embedding-run-converted-model-st ``` * add model-conversion example requirements * add support for -st flag in embedding model conversion This commit add support for the -st flag in the embedding model conversion script. This will enable models to be converted using sentence transformers dense layers.	2025-10-09 14:35:22 +02:00
Aaron Teo	624207e676	devops: add s390x & ppc64le CI (#15925 ) * devops: move s390x and ppc64le ci build we have access to ubuntu-24.04-s390x and ppc64le images now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le for now since they have compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: stop warnings as errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: switch to non-macro flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: going the llama macro route Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian gguf test models Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le to test s390x, check test build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.inp files for big-endian tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.out files for big-endian too Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add python setup and endian byteswap Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: pooring thing does not have s390x python3 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add missing rust compiler for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try rust actions runner Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "devops: try rust actions runner" This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try a different path for rust Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dump home directory and user info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: install gguf-py only Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: missed relative path Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: remove big-endian files since local swapping is working Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: revert test-tokenizer-0 cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix unicode flags conversion from and to uint16_t Bitfields are allocated in different order on s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Simplify byteswap command Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix endianness detection in vocab loader Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Disable test-thread-safety on s390x In this test a model is downloaded, then immediately loaded to check if more downloads are needed, and then used for test. There is no clean way to separate all those steps to add byteswapping between them, so just skip this test. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q8_0 test in test-quantize-fns vec_signed uses unexpected rounding mode. Explicitly use different rounding function. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian stories260K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add s390x test-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix test does not exist Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix model not found llama-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q3_K dot product error in test-quantize-fns on s390x Array q8bytes had only 4 elements allocated, but 8 elements accessed. This lead to write out of bounds and later read of overwritten values out of bounds and incorrect result. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: re-enable ppc64le for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: activate test-thread-safety for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le tests for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: LLAMA_FATAL_WARNINGS=ON Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Correct repository URL for s390x for test-thread-safety model Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix fs_get_cache_directory Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Re-enable CI for ppc64le Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fortify ggml_rope_impl Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way * Update URL for big-endian model * Update .github/workflows/build.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update remaining mentions of BE models to ggml-org/models repo --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Co-authored-by: Aleksei Nikiforov <103434461+AlekseiNikiforovIBM@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-09-27 02:03:33 +08:00
Daniel Bevenius	aa3ee0eb0b	model-conversion : add embedding prompt file support (#15871 ) This commit adds support for passing a prompt file to the model conversion targets/scripts. It also updates the logits.cpp to print out embedding information in the same format as when running the original embedding model. The motivation for this is that it allows us to pass files of different sizes when running the converted models and validating the logits. This can be particularly important when testing the sliding window functionality of models where the sequence length needs to exceed a certain number of tokens to trigger the sliding window logic.	2025-09-25 12:02:36 +02:00
Douglas Hanley	b5bd037832	llama : add support for qwen3 reranker (#15824 )	2025-09-25 11:53:09 +03:00
Jie Fu (傅杰)	63b54c81a6	model-conversion : make causal-verify-logits fails with model names containing "." (#16215 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-24 10:25:26 +02:00
Jie Fu (傅杰)	7735706b93	model-conversion : run-org-model.py fails to run on mac m1 (#16213 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-24 08:46:52 +02:00
Jie Fu (傅杰)	8ba548dae2	model-conversion : fix the make targets in the README.md (#16209 ) Fix two incorrect make targets in the readme. Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-24 06:19:23 +02:00
Georgi Gerganov	432cf4304c	codeowners : update + cleanup (#16174 ) --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-09-22 18:20:21 +03:00

1 2 3 4 5 ...

1682 Commits