llama.cpp

Commit Graph

Author	SHA1	Message	Date
Han Yin	32d778bb8e	core: extract conversation and benchmark logics into InferenceManager; add logs and missing state updates in stub InferenceEngine	2025-10-28 11:39:16 -07:00
Han Yin	51b120f464	data: pass through getModelById from ModelDao into ModelRepository	2025-10-28 11:39:16 -07:00
Han Yin	59f5caa699	Util: split FileUtils from ModelUtils; extract copy methods into FileUtils	2025-10-28 11:39:16 -07:00
Han Yin	4913ad0dae	nit: tidy SystemPromptViewModel	2025-10-28 11:39:16 -07:00
Han Yin	2614f91226	UI: replace model selection screen's data stubbing; add empty view	2025-10-28 11:39:16 -07:00
Han Yin	6b48f7473f	UI: extract a shared ModelCard component	2025-10-28 11:39:16 -07:00
Han Yin	0d41e75ca5	UI: add a confirmation step when user picks a file; refactor model import overlay into AlertDialog	2025-10-28 11:39:16 -07:00
Han Yin	1bebd1bb07	util: extract file size formatting into ModelUtils	2025-10-28 11:39:16 -07:00
Han Yin	561fe0222f	UI: handle back navigation when user is in multi-selection mode	2025-10-28 11:39:16 -07:00
Han Yin	2d6b8856f6	UI: implement multiple models deletion; update Models Management screen	2025-10-28 11:39:16 -07:00
Han Yin	025e3d2417	UI: enrich ModelManagementState; extract filename to show correct importing UI	2025-10-28 11:39:16 -07:00
Han Yin	adfbfe3ffb	data: add a util file for extracting file name & size and model metadata	2025-10-28 11:39:16 -07:00
Han Yin	290a6bfebe	bugfix: use List instead of Collection for ModelDao's deletion	2025-10-28 11:39:16 -07:00
Han Yin	5de0b5d6d0	data: import local model with file picker	2025-10-28 11:39:16 -07:00
Han Yin	a3ebdac58f	UI: polish sort order menu	2025-10-28 11:39:16 -07:00
Han Yin	760d66c97d	UI: replace Models Management screen's stubbing with instrumentation	2025-10-28 11:39:16 -07:00
Han Yin	bc93c384a7	data: introduce Model entity and DAO; update DI module	2025-10-28 11:39:16 -07:00
Han Yin	f5e2edda87	data: [WIP] prepare for ModelRepository refactor & impl	2025-10-28 11:39:16 -07:00
Han Yin	b6cc8f0c01	DI: abstract the protocol of SystemPromptRepository; update AppModule	2025-10-28 11:39:16 -07:00
Han Yin	eebc05b559	UI: polish UI for ModelsManagementScreen; inject ModelsManagementVieModel	2025-10-28 11:39:16 -07:00
Han Yin	6e82bb37d3	Feature: Introduce ModelRepository and ModelsManagementViewModel; update AppModule	2025-10-28 11:39:16 -07:00
Han Yin	aedf442632	DI: Optimize AppModule	2025-10-28 11:39:16 -07:00
Han Yin	d60bba9b8f	UI: navigation with more natural animated transitions	2025-10-28 11:39:16 -07:00
Han Yin	511df35704	bugfix: wait for model to load before navigating to benchmark screen; use NavigationActions instead of raw navController	2025-10-28 11:39:16 -07:00
Han Yin	ea11ee3c94	UI: optimize AppContent's composing	2025-10-28 11:39:16 -07:00
Han Yin	0afd087f35	DI: replace manual DI with Hilt DI	2025-10-28 11:39:16 -07:00
Han Yin	a1f6e7e476	DI: make viewmodels Hilt injectable	2025-10-28 11:39:16 -07:00
Han Yin	564b095427	DI: make app Hilt injectable	2025-10-28 11:39:16 -07:00
Han Yin	65741a7e64	DI: introduce Hilt plugin + processor + lib dependencies	2025-10-28 11:39:16 -07:00
Han Yin	af0d68d611	nit: combine temperatureMetrics and useFahrenheit	2025-10-28 11:39:16 -07:00
Han Yin	5e4972e93e	UI: refactor top app bars	2025-10-28 11:39:16 -07:00
Han Yin	2a41c0e354	vm: replace token metrics stubs with actual implementation	2025-10-28 11:39:16 -07:00
Han Yin	e47e3b77ee	UI: locks user in alert dialog when model is unloading	2025-10-28 11:39:16 -07:00
Han Yin	6b341b0fbe	bugfix: handle user quitting on model loading	2025-10-28 11:39:16 -07:00
Han Yin	e8b84c6ebf	UI: code polish	2025-10-28 11:39:16 -07:00
Han Yin	fddf060d92	data: code polish	2025-10-28 11:39:16 -07:00
Han Yin	3b499ac7e4	UI: polish conversation screen	2025-10-28 11:39:16 -07:00
Han Yin	64ebdc67a6	UI: update app name to be more Arm	2025-10-28 11:39:16 -07:00
Han Yin	55681847e9	UI: rename `ModeSelection` to `ModelLoading` for better clarity	2025-10-28 11:39:16 -07:00
Han Yin	75c986afc5	bugfix: properly handle user's quitting conversation screen while tokens in generation	2025-10-28 11:39:16 -07:00
Han Yin	4848bf93d0	data: introduce repo for System Prompt; flow data from Room to VM	2025-10-28 11:39:16 -07:00
Han Yin	5596d5203b	DB: setup Room database	2025-10-28 11:39:16 -07:00
Han Yin	4046cd16fd	Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject	2025-10-28 11:39:16 -07:00
Han Yin	5868eaa66b	UI: polish system prompt setup UI	2025-10-28 11:39:16 -07:00
Han Yin	a7ee3d305f	UI: split a nested parent settings screen into separate child settings screens	2025-10-28 11:39:16 -07:00
Han Yin	65c09b2b32	UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark	2025-10-28 11:39:16 -07:00
Han Yin	648b97818e	UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark	2025-10-28 11:39:16 -07:00
Han Yin	a7ae8b7ce0	[WIP] DI: implement simple local vm factory provider	2025-10-28 11:39:16 -07:00
Han Yin	ca2b7772ce	UI: add a new MainActivity; update manifest	2025-10-28 11:39:16 -07:00
Han Yin	7e5c80cee9	UI: implement core flow's screens	2025-10-28 11:39:16 -07:00
Han Yin	5ad65919e9	util: implement user preferences utility	2025-10-28 11:39:16 -07:00
Han Yin	46bd638c5f	util: implement performance monitor; wrap it with a viewmodel	2025-10-28 11:39:16 -07:00
Han Yin	4dd755e25b	UI: implement basic UI components	2025-10-28 11:39:16 -07:00
Han Yin	32608fb225	UI: app navigation	2025-10-28 11:39:16 -07:00
Han Yin	3f913ce440	LLM: stub a local inference engine for faster iteration	2025-10-28 11:39:16 -07:00
Han Yin	3787fbddb0	data: define data models for LLM and system prompts	2025-10-28 11:39:16 -07:00
Han Yin	697d778db7	UI: define theme, color palette, typography and shape	2025-10-28 11:39:16 -07:00
Han Yin	cbe7133742	UI: introduce new dependencies, update versions & references	2025-10-28 11:39:16 -07:00
Han Yin	44a522dbc8	UI: move existing UI src files into `legacy` package	2025-10-28 11:39:16 -07:00
Han Yin	37f3e1c415	Feature: use local llama_context for benchmarking; support context init with custom context size	2025-10-28 11:39:16 -07:00
Han Yin	6d2279e9cd	REWRITE JNI bridge; Update viewmodel	2025-10-28 11:39:16 -07:00
Han Yin	e1bc87610e	Perf: allocate `llama_batch` on stack with `llama_batch_init`	2025-10-28 11:39:16 -07:00
Han Yin	2b52563737	Polish: better logging & documentation	2025-10-28 11:39:16 -07:00
Han Yin	ec502cfde9	Feature: implement infinite conversation via context shifting	2025-10-28 11:39:16 -07:00
Han Yin	4e515727b4	Abort on system prompt too long; Truncate user prompt if too long.	2025-10-28 11:39:16 -07:00
Han Yin	4809112ec5	Polish: adopt common naming; init modularization;	2025-10-28 11:39:16 -07:00
Han Yin	8bf2f4d412	Feature: chat template auto formatting	2025-10-28 11:39:16 -07:00
Han Yin	1b0754c0f5	Perf: optimize performance with ARM features	2025-10-28 11:39:16 -07:00
Han Yin	bb5b824208	Polish: populate backend names in `benchModel`	2025-10-28 11:39:16 -07:00
Han Yin	c14c11dcbd	Feature: decode system and user prompt in batches	2025-10-28 11:39:16 -07:00
Han Yin	02465137ca	Bug fix: null system prompt state update; Safeguard empty user prompt	2025-10-28 11:39:16 -07:00
Han Yin	7bbb53aaf8	Clang-tidy linting: make functions & global variables static	2025-10-28 11:39:16 -07:00
Han Yin	f44882aeeb	Enforce centralized dependency management; bump Gradle & deps versions	2025-10-28 11:39:16 -07:00
Han Yin	0ade7fb4d7	Polish binding: Remove verbose setup JNI APIs; Update state machine states.	2025-10-28 11:39:16 -07:00
Han Yin	7dc9968f82	Restructure `LLamaAndroid.kt`	2025-10-28 11:39:16 -07:00
Han Yin	44720859d6	Rewrite llama-android JNI implementation	2025-10-28 11:39:15 -07:00
Han Yin	d4ab3832cf	Use common sampler	2025-10-28 11:39:15 -07:00
Han Yin	1f255d4bca	Tidy & clean LLamaAndroid binding	2025-10-28 11:39:15 -07:00
Georgi Gerganov	745aa5319b	llama : deprecate llama_kv_self_ API (#14030 ) * llama : deprecate llama_kv_self_ API ggml-ci * llama : allow llama_memory_(nullptr) ggml-ci * memory : add flag for optional data clear in llama_memory_clear ggml-ci	2025-06-06 14:11:15 +03:00
Xuan-Son Nguyen	bd3f59f812	cmake : enable curl by default (#12761 ) * cmake : enable curl by default * no curl if no examples * fix build * fix build-linux-cross * add windows-setup-curl * fix * shell * fix path * fix windows-latest-cmake* * run: include_directories * LLAMA_RUN_EXTRA_LIBS * sycl: no llama_curl * no test-arg-parser on windows * clarification * try riscv64 / arm64 * windows: include libcurl inside release binary * add msg * fix mac / ios / android build * will this fix xcode? * try clearing the cache * add bunch of licenses * revert clear cache * fix xcode * fix xcode (2) * fix typo	2025-04-07 13:35:19 +02:00
Georgi Gerganov	e0dbec0bc6	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 ) * llama : refactor llama_context, llama_kv_cache, llm_build_context ggml-ci * graph : don't mutate the KV cache during defrag ggml-ci * context : reduce virtuals + remove test function ggml-ci * context : move interface implementation to source file + factory ggml-ci * graph : move KV cache build functions to llama_context impl ggml-ci * graph : remove model reference from build_pooling ggml-ci * graph : remove llama_model reference ggml-ci * kv_cache : provide rope factors ggml-ci * graph : rework inputs to use only unique_ptr, remove attn input abstraction ggml-ci * context : remove llama_context_i abstraction ggml-ci * context : clean-up ggml-ci * graph : clean-up ggml-ci * llama : remove redundant keywords (struct, enum) ggml-ci * model : adapt gemma3 ggml-ci * graph : restore same attention ops as on master ggml-ci * llama : remove TODO + fix indent ggml-ci	2025-03-13 12:35:44 +02:00
Han Yin	57b6abf85a	android : fix KV cache log message condition (#12212 )	2025-03-06 08:22:49 +02:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
codezjx	3edfa7d375	llama.android: add field formatChat to control whether to parse special tokens when send message (#11270 )	2025-01-17 14:57:56 +02:00
Georgi Gerganov	afa8a9ec9b	llama : add `llama_vocab`, functions -> methods, naming (#11110 ) * llama : functions -> methods (#11110) * llama : add struct llama_vocab to the API (#11156) ggml-ci * hparams : move vocab params to llama_vocab (#11159) ggml-ci * vocab : more pimpl (#11165) ggml-ci * vocab : minor tokenization optimizations (#11160) ggml-ci Co-authored-by: Diego Devesa <slarengh@gmail.com> * lora : update API names (#11167) ggml-ci * llama : update API names to use correct prefix (#11174) * llama : update API names to use correct prefix ggml-ci * cont ggml-ci * cont ggml-ci * minor [no ci] * vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174) ggml-ci * vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174) ggml-ci --------- Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-12 11:32:42 +02:00
ag2s20150909	c250ecb315	android : fix llama_batch free (#11014 )	2024-12-30 14:35:13 +02:00
Diego Devesa	9177484f58	ggml : fix arm build (#10890 ) * ggml: GGML_NATIVE uses -mcpu=native on ARM Signed-off-by: Adrien Gallouët <angt@huggingface.co> * ggml: Show detected features with GGML_NATIVE Signed-off-by: Adrien Gallouët <angt@huggingface.co> * remove msvc support, add GGML_CPU_ARM_ARCH option * disable llamafile in android example * march -> mcpu, skip adding feature macros ggml-ci --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co> Co-authored-by: Adrien Gallouët <angt@huggingface.co>	2024-12-18 23:21:42 +01:00
Xuan Son Nguyen	cda0e4b648	llama : remove all_pos_0, all_pos_1, all_seq_id from llama_batch (#9745 ) * refactor llama_batch_get_one * adapt all examples * fix simple.cpp * fix llama_bench * fix * fix context shifting * free batch before return * use common_batch_add, reuse llama_batch in loop * null terminated seq_id list * fix save-load-state example * fix perplexity * correct token pos in llama_batch_allocr	2024-10-18 23:18:01 +02:00
Diego Devesa	7eee341bee	common : use common_ prefix for common library functions (#9805 ) * common : use common_ prefix for common library functions --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-10-10 22:57:42 +02:00
Diego Devesa	c81f3bbb05	cmake : do not build common library by default when standalone (#9804 )	2024-10-09 18:49:52 +02:00
slaren	5fb5e24811	llama : minor sampling refactor (2) (#9386 )	2024-09-09 17:10:46 +02:00
Georgi Gerganov	a5b5d9a101	llama.android : fix build (#9350 )	2024-09-08 00:33:50 +03:00
Georgi Gerganov	df270ef745	llama : refactor sampling v2 (#9294 ) - Add `struct llama_sampler` and `struct llama_sampler_i` - Add `llama_sampler_` API - Add `llama_sampler_chain_` API for chaining multiple samplers - Remove `LLAMA_API_INTERNAL` - Add `llama_perf_` API and remove old `llama_print_timings` and `llama_reset_timings`	2024-09-07 15:16:19 +03:00
devojony	b7c11d36e6	examples: fix android example cannot be generated continuously (#8621 ) When generation ends `completion_loop()` should return a NULL, not the empty string	2024-07-22 09:54:42 +03:00
Raj Hammeer Singh Hada	387952651a	Delete examples/llama.android/llama/CMakeLists.txt (#8165 ) * Delete examples/llama.android/llama/CMakeLists.txt https://github.com/ggerganov/llama.cpp/pull/8145#issuecomment-2194534244 This file is not being used for building on Android. `llama.cpp/examples/llama.android/llama/src/main/cpp/CMakeLists.txt` is being used instead. * Update CMakeLists.txt Pick local llama.cpp files instead of fetching content from git	2024-06-27 16:39:29 +02:00
Raj Hammeer Singh Hada	ac146628e4	Fix llama-android.cpp for error - "common/common.h not found" (#8145 ) - Path seems to be wrong for the common.h header file in llama-android.cpp file. Fixing the path so the Android Build doesn't fail with the error "There is no file common/common.h"	2024-06-27 03:57:57 +02:00
Elton Kola	9791f40258	android : module (#7502 ) * move ndk code to a new library * add gradle file	2024-05-25 11:11:33 +03:00
Georgi Gerganov	854d365aba	cmake : update android comments (#7341 )	2024-05-19 11:01:01 +03:00
Georgi Gerganov	511182eabb	android : use "ci-android" branch for CI (#7341 ) * android : use "ci-android" branch for CI * ggml : disable SIMD exp and silu for 32-bit ARM ggml-ci * android : do not fetch, use add_subdirectory instead * cmake : provide binary dir	2024-05-18 20:40:39 +10:00
Brian	1265c670fd	Revert "move ndk code to a new library (#6951 )" (#7282 ) This reverts commit `efc8f767c8`.	2024-05-14 16:10:39 +03:00
Elton Kola	efc8f767c8	move ndk code to a new library (#6951 )	2024-05-14 17:30:30 +10:00
Pedro Cuenca	b97bc3966e	llama : support Llama 3 HF conversion (#6745 ) * Support Llama 3 conversion The tokenizer is BPE. * style * Accept suggestion Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> * llama : add llama_token_is_eog() ggml-ci * llama : auto-detect more EOT tokens when missing in KV data * convert : replacing EOS token is a hack * llama : fix codegemma EOT token + add TODOs * llama : fix model type string for 8B model --------- Co-authored-by: Sourab Mangrulkar <13534540+pacman100@users.noreply.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-04-21 14:50:41 +03:00
Dean	7ab7b733bb	android : fix utf8 decoding error (#5935 ) * examples: fix utf8 decoding error some models have a tokenizer that decodes an id into an incomplete utf8 sequence, need to validate and wait for next token one example would be: https://huggingface.co/Qwen/Qwen1.5-1.8B-Chat-GGUF/resolve/main/qwen1_5-1_8b-chat-q4_0.gguf and and an example of the token is 18137 * android : minor --------- Co-authored-by: zhangfuwen <zhangfuwen@foxmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-03-10 22:03:17 +02:00
Radosław Gryta	abbabc5e51	ggml-quants : provide ggml_vqtbl1q_u8 for 64bit compatibility (#5711 ) * [ggml-quants] Provide ggml_vqtbl1q_u8 for 64bit compatibility vqtbl1q_u8 is not part of arm v7 neon library * [android-example] Remove abi filter after arm v7a fix * [github-workflows] Do not skip Android armeabi-v7a build	2024-02-25 20:43:00 +02:00
bmwl	f486f6e1e5	ggml : add numa options (#5377 ) * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverted Makefile * Fixed include * Removed sched.h from ggml.h, moved ggml_get_numa_affinity into ggml.c, removed trailing whitespace and fixed up a few inconsistent variables * removed trailing whitespace * Added numa options to allow finer grained control as well as plumbing for a new mirror mode that will require numa.h * Reverting Makefile * Fixed a number of issues with the move from BOOL to ggml_numa_strategies. Added a note about mirror mode note being implemented yet * Removing MIRROR_MODE code for this PR * Removing last bit of MIRROR_MODE code for this PR * Removing unneeded branch in server.cpp example and moving get_numa_affinity and making it static * Fixed lingering init_llama_backend() bool calls in tests and examples * Remote enum llama_numa_strategies * Revert bad merge with dynatemp flags * add missing enum ggml_numa_strategies declaration and revert sync problem with master * add missing enum ggml_numa_strategies declaration * fixed ggml_init_numa variable * Update ggml.h Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update READMEs with info about numa flags, change INTERLEAVE strategy name to DISTRIBUTE everywhere, implement the improved distribution strategy from @rankaiyx, fix a spelling mistake and un-merge some bad merges * split numa init out from llama_backend_init and created llama_numa_init. Updated all code paths and samples * Fix up some boolean vs enum comparisons * Added #ifdefs for non-Linux OS that don't have cpu_set_t datatype * Update ggml.h Align enum values Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c Remove whitespace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml.c align paremeters Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update examples/server/server.cpp remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update common/common.cpp Remove whitespace and align brace Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * unified ggml_numa_strategy enum and fixed text alignment in server.cpp example * Update ggml.c simplified return for platforms without NUMA support Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * removed redundant else from cli argument processing of --numa * whitespace --------- Co-authored-by: root <root@nenya.lothlorien.ca> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Jared Van Bortel <jared@nomic.ai>	2024-02-16 11:31:07 +02:00
Valentin Konovalov	256d1bb0dd	android : use release cmake build type by default (#5123 )	2024-01-25 19:05:51 +02:00
Neuman Vong	862f5e41ab	android : introduce starter project example (#4926 ) * Introduce starter project for Android Based on examples/llama.swiftui. * Add github workflow * Set NDK version * Only build arm64-v8a in CI * Sync bench code * Rename CI prop to skip-armeabi-v7a * Remove unused tests	2024-01-16 15:47:34 +02:00

1 2 3 4 5

207 Commits