llama.cpp

Commit Graph

Author	SHA1	Message	Date
Han Yin	a8dc825aef	UI: handle back press on Model Selection screen	2025-10-28 11:39:17 -07:00
Han Yin	d1b018e375	UI: show a Snack bar to warn user that system prompt is not always supported	2025-10-28 11:39:17 -07:00
Han Yin	56a7272858	UI: polish model cards on Benchmark and Conversation screens to show model loading metrics	2025-10-28 11:39:17 -07:00
Han Yin	10ca2fa834	util: extract formatting helper functions from FileUtils into a new FormatUtils	2025-10-28 11:39:17 -07:00
Han Yin	d7afcc41d5	UI: polish ModelLoading screen	2025-10-28 11:39:17 -07:00
Han Yin	57b5001f5c	nit: revert accidental committing of debug code	2025-10-28 11:39:17 -07:00
Han Yin	ec47fa3d14	nit: allow deselect model on Model Selection screen	2025-10-28 11:39:17 -07:00
Han Yin	6b74c49e6b	UI: polish model card	2025-10-28 11:39:17 -07:00
Han Yin	c12ef7a779	UI: update ModelSelectionScreen with a preselect mechanism	2025-10-28 11:39:17 -07:00
Han Yin	b81a0c6e6d	UI: refactor ModelCard UI to show GGUF metadata	2025-10-28 11:39:17 -07:00
Han Yin	9056f27a91	nit: rename lastUsed field to dateLastUsed; add dateAdded field	2025-10-28 11:39:17 -07:00
Han Yin	7540c2a8b9	nit: refactor data.local package structure	2025-10-28 11:39:17 -07:00
Han Yin	7ed79319e5	GGUF: make GgufMetadata serializable in order to be compatible with Room	2025-10-28 11:39:17 -07:00
Han Yin	8ae0c3d2fa	DB: introduce Kotlin serialization extension's library and plugin; add Room runtime library	2025-10-28 11:39:17 -07:00
Han Yin	67499727ef	gguf: add GGUF metadata data holder and its corresponding extractor implementation	2025-10-28 11:39:17 -07:00
Han Yin	a9466c0370	navigation: sink model loading state management from AppContent down into ModelLoadingScreen; pass ModelLoadingMetrics to Benchmark and Conversation screens	2025-10-28 11:39:17 -07:00
Han Yin	8a682ff85d	core: throw Exception instead of returning null if model fails to load	2025-10-28 11:39:17 -07:00
Han Yin	f313362ced	nit: polish ModelLoadingScreen UI	2025-10-28 11:39:17 -07:00
Han Yin	1d508f367e	UI: update AppContent to pass in correct navigation callbacks	2025-10-28 11:39:17 -07:00
Han Yin	0d65c4b06b	nit: extract app name into a constant value; remove unused onBackPressed callbacks	2025-10-28 11:39:17 -07:00
Han Yin	9f1d26ac95	UI: migrate ConversationViewModel onto ModelLoadingViewModel; update & refine ConversationScreen	2025-10-28 11:39:17 -07:00
Han Yin	cb508be782	UI: migrate ModelLoadingScreen onto ModelLoadingViewModel; update & refine ModelLoadingScreen	2025-10-28 11:39:17 -07:00
Han Yin	f61c512223	UI: expose a single facade ModelUnloadDialogHandler; move UnloadModelState into ModelUnloadingViewModel.kt	2025-10-28 11:39:17 -07:00
Han Yin	c5a3ac7eb1	UI: Introduce an abstract ViewModel to handle additional model unloading logics	2025-10-28 11:39:17 -07:00
Han Yin	e1c77c6bbd	LLama: add a new Initializing state; ; add two extension properties; rename LibraryLoaded state to Initialized	2025-10-28 11:39:17 -07:00
Han Yin	ba40d689a1	UI: implement BenchmarkScreen's individual back handling	2025-10-28 11:39:17 -07:00
Han Yin	8203ddb97a	UI: refactor back handling by removing centralized BackHandlerSetup and UnloadModelConfirmationDialog from AppContent	2025-10-28 11:39:17 -07:00
Han Yin	c08d02d233	LLama: add ModelUnloadingState to engine State; add missing state checks in stub engine; fix instrumentation engine's error messages	2025-10-28 11:39:17 -07:00
Han Yin	481ba6e9d3	UI: remove code duplication in sort menu	2025-10-28 11:39:17 -07:00
Han Yin	41615be5ae	UI: fix the typo `totalGb` in `StorageMetrics`	2025-10-28 11:39:17 -07:00
Han Yin	69f2bd62f9	UI: replace ugly optional as casts in AppScaffold with extension functions	2025-10-28 11:39:17 -07:00
Han Yin	e269da655f	UI: combine TopBarConfig and BottomBarConfig into each route's ScaffoldConfig	2025-10-28 11:39:17 -07:00
Han Yin	225c5435c5	UI: refactor BottomBarConfig.ModelsManagement APIs	2025-10-28 11:39:17 -07:00
Han Yin	63fc56d603	UI: centralize the AppScaffold and modularize its configs	2025-10-28 11:39:17 -07:00
Han Yin	72e97b93c5	feature: check for available space before copying local model	2025-10-28 11:39:16 -07:00
Han Yin	65d4a57a8b	LLama: refactor loadModel by splitting the system prompt setting into a separate method	2025-10-28 11:39:16 -07:00
Han Yin	9f77155535	VM: handle the cancellation of ongoing token generation	2025-10-28 11:39:16 -07:00
Han Yin	46859c10f0	LLama: update engine state after handling the cancellation of sendUserPrompt	2025-10-28 11:39:16 -07:00
Han Yin	06448a60a5	UI: update UI ongoing model import's cancellation	2025-10-28 11:39:16 -07:00
Han Yin	9ba74a9d3d	data: allow canceling the ongoing model import	2025-10-28 11:39:16 -07:00
Han Yin	d70b8fe323	core: swap in LLamaAndroid and mark stub engine for testing only	2025-10-28 11:39:16 -07:00
Han Yin	c2426a42e5	UI: unify Model Card components	2025-10-28 11:39:16 -07:00
Han Yin	434933f5b3	UI: show model card in Conversation screen	2025-10-28 11:39:16 -07:00
Han Yin	9769467723	UI: show model card in Model Loading screen	2025-10-28 11:39:16 -07:00
Han Yin	9cfa74f754	core: break down InferenceManager due to Interface Segregation Principle	2025-10-28 11:39:16 -07:00
Han Yin	286ed05f13	vm: merge SystemPromptViewModel into ModelLoadingViewModel	2025-10-28 11:39:16 -07:00
Han Yin	23d411d86e	vm: split mono MainViewModel into separate individual ViewModels	2025-10-28 11:39:16 -07:00
Han Yin	32d778bb8e	core: extract conversation and benchmark logics into InferenceManager; add logs and missing state updates in stub InferenceEngine	2025-10-28 11:39:16 -07:00
Han Yin	51b120f464	data: pass through getModelById from ModelDao into ModelRepository	2025-10-28 11:39:16 -07:00
Han Yin	59f5caa699	Util: split FileUtils from ModelUtils; extract copy methods into FileUtils	2025-10-28 11:39:16 -07:00
Han Yin	4913ad0dae	nit: tidy SystemPromptViewModel	2025-10-28 11:39:16 -07:00
Han Yin	2614f91226	UI: replace model selection screen's data stubbing; add empty view	2025-10-28 11:39:16 -07:00
Han Yin	6b48f7473f	UI: extract a shared ModelCard component	2025-10-28 11:39:16 -07:00
Han Yin	0d41e75ca5	UI: add a confirmation step when user picks a file; refactor model import overlay into AlertDialog	2025-10-28 11:39:16 -07:00
Han Yin	1bebd1bb07	util: extract file size formatting into ModelUtils	2025-10-28 11:39:16 -07:00
Han Yin	561fe0222f	UI: handle back navigation when user is in multi-selection mode	2025-10-28 11:39:16 -07:00
Han Yin	2d6b8856f6	UI: implement multiple models deletion; update Models Management screen	2025-10-28 11:39:16 -07:00
Han Yin	025e3d2417	UI: enrich ModelManagementState; extract filename to show correct importing UI	2025-10-28 11:39:16 -07:00
Han Yin	adfbfe3ffb	data: add a util file for extracting file name & size and model metadata	2025-10-28 11:39:16 -07:00
Han Yin	290a6bfebe	bugfix: use List instead of Collection for ModelDao's deletion	2025-10-28 11:39:16 -07:00
Han Yin	5de0b5d6d0	data: import local model with file picker	2025-10-28 11:39:16 -07:00
Han Yin	a3ebdac58f	UI: polish sort order menu	2025-10-28 11:39:16 -07:00
Han Yin	760d66c97d	UI: replace Models Management screen's stubbing with instrumentation	2025-10-28 11:39:16 -07:00
Han Yin	bc93c384a7	data: introduce Model entity and DAO; update DI module	2025-10-28 11:39:16 -07:00
Han Yin	f5e2edda87	data: [WIP] prepare for ModelRepository refactor & impl	2025-10-28 11:39:16 -07:00
Han Yin	b6cc8f0c01	DI: abstract the protocol of SystemPromptRepository; update AppModule	2025-10-28 11:39:16 -07:00
Han Yin	eebc05b559	UI: polish UI for ModelsManagementScreen; inject ModelsManagementVieModel	2025-10-28 11:39:16 -07:00
Han Yin	6e82bb37d3	Feature: Introduce ModelRepository and ModelsManagementViewModel; update AppModule	2025-10-28 11:39:16 -07:00
Han Yin	aedf442632	DI: Optimize AppModule	2025-10-28 11:39:16 -07:00
Han Yin	d60bba9b8f	UI: navigation with more natural animated transitions	2025-10-28 11:39:16 -07:00
Han Yin	511df35704	bugfix: wait for model to load before navigating to benchmark screen; use NavigationActions instead of raw navController	2025-10-28 11:39:16 -07:00
Han Yin	ea11ee3c94	UI: optimize AppContent's composing	2025-10-28 11:39:16 -07:00
Han Yin	0afd087f35	DI: replace manual DI with Hilt DI	2025-10-28 11:39:16 -07:00
Han Yin	a1f6e7e476	DI: make viewmodels Hilt injectable	2025-10-28 11:39:16 -07:00
Han Yin	564b095427	DI: make app Hilt injectable	2025-10-28 11:39:16 -07:00
Han Yin	65741a7e64	DI: introduce Hilt plugin + processor + lib dependencies	2025-10-28 11:39:16 -07:00
Han Yin	af0d68d611	nit: combine temperatureMetrics and useFahrenheit	2025-10-28 11:39:16 -07:00
Han Yin	5e4972e93e	UI: refactor top app bars	2025-10-28 11:39:16 -07:00
Han Yin	2a41c0e354	vm: replace token metrics stubs with actual implementation	2025-10-28 11:39:16 -07:00
Han Yin	e47e3b77ee	UI: locks user in alert dialog when model is unloading	2025-10-28 11:39:16 -07:00
Han Yin	6b341b0fbe	bugfix: handle user quitting on model loading	2025-10-28 11:39:16 -07:00
Han Yin	e8b84c6ebf	UI: code polish	2025-10-28 11:39:16 -07:00
Han Yin	fddf060d92	data: code polish	2025-10-28 11:39:16 -07:00
Han Yin	3b499ac7e4	UI: polish conversation screen	2025-10-28 11:39:16 -07:00
Han Yin	64ebdc67a6	UI: update app name to be more Arm	2025-10-28 11:39:16 -07:00
Han Yin	55681847e9	UI: rename `ModeSelection` to `ModelLoading` for better clarity	2025-10-28 11:39:16 -07:00
Han Yin	75c986afc5	bugfix: properly handle user's quitting conversation screen while tokens in generation	2025-10-28 11:39:16 -07:00
Han Yin	4848bf93d0	data: introduce repo for System Prompt; flow data from Room to VM	2025-10-28 11:39:16 -07:00
Han Yin	5596d5203b	DB: setup Room database	2025-10-28 11:39:16 -07:00
Han Yin	4046cd16fd	Deps: bump Kotlin plugin; introduce KSP; apply in :app subproject	2025-10-28 11:39:16 -07:00
Han Yin	5868eaa66b	UI: polish system prompt setup UI	2025-10-28 11:39:16 -07:00
Han Yin	a7ee3d305f	UI: split a nested parent settings screen into separate child settings screens	2025-10-28 11:39:16 -07:00
Han Yin	65c09b2b32	UI: allow drawer's gesture control only on Home and Settings screens; enable alert dialog on back navigation inside conversation and benchmark	2025-10-28 11:39:16 -07:00
Han Yin	648b97818e	UI: disable triggering drawer via gesture; enable alert dialog on back navigation inside conversation and benchmark	2025-10-28 11:39:16 -07:00
Han Yin	a7ae8b7ce0	[WIP] DI: implement simple local vm factory provider	2025-10-28 11:39:16 -07:00
Han Yin	ca2b7772ce	UI: add a new MainActivity; update manifest	2025-10-28 11:39:16 -07:00
Han Yin	7e5c80cee9	UI: implement core flow's screens	2025-10-28 11:39:16 -07:00
Han Yin	5ad65919e9	util: implement user preferences utility	2025-10-28 11:39:16 -07:00
Han Yin	46bd638c5f	util: implement performance monitor; wrap it with a viewmodel	2025-10-28 11:39:16 -07:00
Han Yin	4dd755e25b	UI: implement basic UI components	2025-10-28 11:39:16 -07:00
Han Yin	32608fb225	UI: app navigation	2025-10-28 11:39:16 -07:00
Han Yin	3f913ce440	LLM: stub a local inference engine for faster iteration	2025-10-28 11:39:16 -07:00
Han Yin	3787fbddb0	data: define data models for LLM and system prompts	2025-10-28 11:39:16 -07:00
Han Yin	697d778db7	UI: define theme, color palette, typography and shape	2025-10-28 11:39:16 -07:00
Han Yin	cbe7133742	UI: introduce new dependencies, update versions & references	2025-10-28 11:39:16 -07:00
Han Yin	44a522dbc8	UI: move existing UI src files into `legacy` package	2025-10-28 11:39:16 -07:00
Han Yin	37f3e1c415	Feature: use local llama_context for benchmarking; support context init with custom context size	2025-10-28 11:39:16 -07:00
Han Yin	6d2279e9cd	REWRITE JNI bridge; Update viewmodel	2025-10-28 11:39:16 -07:00
Han Yin	e1bc87610e	Perf: allocate `llama_batch` on stack with `llama_batch_init`	2025-10-28 11:39:16 -07:00
Han Yin	2b52563737	Polish: better logging & documentation	2025-10-28 11:39:16 -07:00
Han Yin	ec502cfde9	Feature: implement infinite conversation via context shifting	2025-10-28 11:39:16 -07:00
Han Yin	4e515727b4	Abort on system prompt too long; Truncate user prompt if too long.	2025-10-28 11:39:16 -07:00
Han Yin	4809112ec5	Polish: adopt common naming; init modularization;	2025-10-28 11:39:16 -07:00
Han Yin	8bf2f4d412	Feature: chat template auto formatting	2025-10-28 11:39:16 -07:00
Han Yin	1b0754c0f5	Perf: optimize performance with ARM features	2025-10-28 11:39:16 -07:00
Han Yin	bb5b824208	Polish: populate backend names in `benchModel`	2025-10-28 11:39:16 -07:00
Han Yin	c14c11dcbd	Feature: decode system and user prompt in batches	2025-10-28 11:39:16 -07:00
Han Yin	02465137ca	Bug fix: null system prompt state update; Safeguard empty user prompt	2025-10-28 11:39:16 -07:00
Han Yin	7bbb53aaf8	Clang-tidy linting: make functions & global variables static	2025-10-28 11:39:16 -07:00
Han Yin	f44882aeeb	Enforce centralized dependency management; bump Gradle & deps versions	2025-10-28 11:39:16 -07:00
Han Yin	0ade7fb4d7	Polish binding: Remove verbose setup JNI APIs; Update state machine states.	2025-10-28 11:39:16 -07:00
Han Yin	7dc9968f82	Restructure `LLamaAndroid.kt`	2025-10-28 11:39:16 -07:00
Han Yin	44720859d6	Rewrite llama-android JNI implementation	2025-10-28 11:39:15 -07:00
Han Yin	d4ab3832cf	Use common sampler	2025-10-28 11:39:15 -07:00
Han Yin	1f255d4bca	Tidy & clean LLamaAndroid binding	2025-10-28 11:39:15 -07:00
Daniel Bevenius	56b4795842	model-conversion : add support for SentenceTransformers (#16387 ) * model-conversion : add support for SentenceTransformers This commit adds support for models that use SentenceTransformer layers. The motivation for this is that if converted model includes any of the numbered layers specified in the original models repository then these changes enable these models to be used and verified. Currently the model-conversion only support the base model output without any of the additional transformation layers. Usage: Convert the model that also includes the SentenceTransformer layers: ```console (venv) $ export EMBEDDING_MODEL_PATH="~/google/embeddinggemma-300M" (venv) make embedding-convert-model ``` Verify the produced embeddings from the converted model against the original model embeddings: ```console (venv) make embedding-verify-logits-st ``` The original model can be run using SentenceTransformer: ```console (venv) make embedding-run-original-model-st ``` Run the converted model using "SentenceTransformer" layers whic enables pooling and normalization: ```console (venv) make embedding-run-converted-model-st ``` * add model-conversion example requirements * add support for -st flag in embedding model conversion This commit add support for the -st flag in the embedding model conversion script. This will enable models to be converted using sentence transformers dense layers.	2025-10-09 14:35:22 +02:00
Aaron Teo	624207e676	devops: add s390x & ppc64le CI (#15925 ) * devops: move s390x and ppc64le ci build we have access to ubuntu-24.04-s390x and ppc64le images now Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le for now since they have compiler errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: stop warnings as errors Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: switch to non-macro flag Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: going the llama macro route Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian gguf test models Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le to test s390x, check test build Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.inp files for big-endian tests Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dup .gguf.out files for big-endian too Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add python setup and endian byteswap Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: pooring thing does not have s390x python3 Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add missing rust compiler for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try rust actions runner Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Revert "devops: try rust actions runner" This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: try a different path for rust Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: dump home directory and user info Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: install gguf-py only Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: missed relative path Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: remove big-endian files since local swapping is working Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: revert test-tokenizer-0 cmakelists Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix unicode flags conversion from and to uint16_t Bitfields are allocated in different order on s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Simplify byteswap command Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix endianness detection in vocab loader Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Disable test-thread-safety on s390x In this test a model is downloaded, then immediately loaded to check if more downloads are needed, and then used for test. There is no clean way to separate all those steps to add byteswapping between them, so just skip this test. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q8_0 test in test-quantize-fns vec_signed uses unexpected rounding mode. Explicitly use different rounding function. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add big-endian stories260K Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: add s390x test-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix test does not exist Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: fix model not found llama-eval-callback Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix q3_K dot product error in test-quantize-fns on s390x Array q8bytes had only 4 elements allocated, but 8 elements accessed. This lead to write out of bounds and later read of overwritten values out of bounds and incorrect result. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: re-enable ppc64le for testing Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: activate test-thread-safety for s390x Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: disable ppc64le tests for some reason it keeps failing test-thread-safety tests and I do not have a machine that is able to replicate the tests. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * devops: LLAMA_FATAL_WARNINGS=ON Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Correct repository URL for s390x for test-thread-safety model Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fix fs_get_cache_directory Ensure it works even if both XDG_CACHE_HOME and HOME are unset. This might happen in containers. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Re-enable CI for ppc64le Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Fortify ggml_rope_impl Only memcpy data from sections argument if it's non-NULL. Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> * Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way * Update URL for big-endian model * Update .github/workflows/build.yml Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update remaining mentions of BE models to ggml-org/models repo --------- Signed-off-by: Aaron Teo <aaron.teo1@ibm.com> Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Co-authored-by: Aleksei Nikiforov <103434461+AlekseiNikiforovIBM@users.noreply.github.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-09-27 02:03:33 +08:00
Daniel Bevenius	aa3ee0eb0b	model-conversion : add embedding prompt file support (#15871 ) This commit adds support for passing a prompt file to the model conversion targets/scripts. It also updates the logits.cpp to print out embedding information in the same format as when running the original embedding model. The motivation for this is that it allows us to pass files of different sizes when running the converted models and validating the logits. This can be particularly important when testing the sliding window functionality of models where the sequence length needs to exceed a certain number of tokens to trigger the sliding window logic.	2025-09-25 12:02:36 +02:00
Douglas Hanley	b5bd037832	llama : add support for qwen3 reranker (#15824 )	2025-09-25 11:53:09 +03:00
Jie Fu (傅杰)	63b54c81a6	model-conversion : make causal-verify-logits fails with model names containing "." (#16215 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-24 10:25:26 +02:00
Jie Fu (傅杰)	7735706b93	model-conversion : run-org-model.py fails to run on mac m1 (#16213 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-24 08:46:52 +02:00
Jie Fu (傅杰)	8ba548dae2	model-conversion : fix the make targets in the README.md (#16209 ) Fix two incorrect make targets in the readme. Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-24 06:19:23 +02:00
Georgi Gerganov	432cf4304c	codeowners : update + cleanup (#16174 ) --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-09-22 18:20:21 +03:00
GideonSerf	c6db9a1027	embedding : fix typos in README (#16171 )	2025-09-22 11:49:58 +03:00
Jie Fu (傅杰)	1cbd80f8cf	examples : support encoder-decoder models in the simple example (#16002 ) Signed-off-by: Jie Fu <jiefu@tencent.com>	2025-09-17 10:29:00 +03:00
Aman Gupta	6d758839ff	Add LLaDA-7b-MoE diffusion model (#16003 )	2025-09-16 10:38:28 +08:00
Piotr Wilkin (ilintar)	acc1b008cf	model-conversion : add extra debugging support for model conversion (#15877 ) * feat: Extra debugging support for model conversion - added BF16 support for llama-callback-eval and support for dumping intermediate steps in run-org-model.py	2025-09-09 06:05:55 +02:00
Aldehir Rojas	7057faf64b	json : support `enum` values within `allOf` (#15830 )	2025-09-08 16:14:32 -05:00
Erik Scholz	a81283820a	gguf: gguf_writer refactor (#15691 ) * gguf: split gguf writer into base and buf impl * gguf: templated gguf write out * gguf: file based writer (avoid writing everything to memory first!) * examples(llama2c): fix log not being the same level and compiler nits	2025-09-05 11:34:28 +02:00
Daniel Bevenius	5d6688de08	model-conversion : add --embeddings flag to modelcard.template [no ci] (#15801 ) This commit updates the modelcard.template file used in the model conversion scripts for embedding models to include the llama-server --embeddings flag in the recommended command to run the model. The motivation for this change was that when using the model-conversion "tool" to upload the EmbeddingGemma models to Hugging Face this flag was missing and the embedding endpoint was there for not available when copy&pasting the command.	2025-09-05 04:36:23 +02:00
Daniel Bevenius	407c23786d	model-conversion : fix pyright errors (#15770 ) This commit addresses type errors reported by pyright in the model conversion scripts.	2025-09-03 18:28:36 +02:00
Daniel Bevenius	40a751ea9a	model-conversion : remove hardcoded /bin/bash shebangs [no ci] (#15765 ) * model-conversion : remove hardcoded /bin/bash shebangs [no ci] This commit updates the bash scripts to use env instead of using hardcoded /bin/bash in the shebang line. The motivation for this is that some systems may have bash installed in a different location, and using /usr/bin/env bash ensures that the script will use the first bash interpreter found in the user's PATH, making the scripts more portable across different environments. * model-conversion : rename script to .py [no ci] This commit renames run-casual-gen-embeddings-org.sh to run-casual-gen-embeddings-org.py to reflect its Python nature.	2025-09-03 12:50:47 +02:00
Daniel Bevenius	8c3fdf44ec	model-conversion : add missing curl script [no ci] (#15761 ) This commit adds a curl script to the model-conversion examples which is currently missing. This script is required for the running the embedding server targets to test llama-server embeddings functionality.	2025-09-03 09:48:35 +02:00
Georgi Gerganov	e92d53b29e	sampling : optimize samplers by reusing bucket sort (#15665 ) * sampling : optimize sorting using bucket sort in more places ggml-ci * sampling : do not sort in dist sampler ggml-ci * sampling : avoid heap allocations for sort buffers ggml-ci * common : add option to sort sampling candidates by probability ggml-ci * sampling : revert the change for preserving sort buffers * sampling : use std::copy instead of memcpy * sampling : clarify purpose of partial sort helpers ggml-ci * cont : remove wrong comment [no ci] * common : update comment Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-08-31 20:41:02 +03:00
Johannes Gäßler	e81b8e4b7f	llama: use FA + max. GPU layers by default (#15434 ) * llama: use max. GPU layers by default, auto -fa * ggml-backend: abort instead of segfault	2025-08-30 16:32:10 +02:00
Gabe Goodhart	a8bca68f72	fix: Compute the full sum in llama-eval-callback, not just the sum of printed values (#15637 ) This makes it much easier to compare between llama.cpp and transformers! https://github.com/ggml-org/llama.cpp/issues/nemotron-nano-15409 Branch: gabe-l-hart/nvidia-nemotron-nano-15409 Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>	2025-08-28 15:27:36 -05:00
Daniel Bevenius	46d9caa27a	model-conversion : add mmproj conversion target (#15628 ) This commit adds a new target to the Makefile for converting models that are multimodal. This target will convert the original model and in addition also create the mmproj GGUF model. The motivation for this change is that for models that are multimodal, for example those that contain a vision encoders, we will often want to upload both the quantized model and the vision encoder model to HuggingFace. Example usage: ```console $ make causal-convert-mm-model MODEL_PATH=~/work/ai/models/gemma-3-4b-it-qat-q4_0-unquantized/ ... The environment variable CONVERTED_MODEL can be set to this path using: export CONVERTED_MODEL=/home/danbev/work/ai/llama.cpp/models/gemma-3-4b-it-qat-q4_0-unquantized.gguf The mmproj model was created in /home/danbev/work/ai/llama.cpp/models/mmproj-gemma-3-4b-it-qat-q4_0-unquantized.gguf ``` The converted original model can then be quantized, and after that both the quantized model and the mmproj file can then be uploaded to HuggingFace. Refs: https://huggingface.co/ggml-org/gemma-3-4b-it-qat-GGUF/tree/main	2025-08-28 09:26:48 +02:00
Daniel Bevenius	62cef26ac5	model-conversion : add qat-q4 quantization targets (#15588 ) This commit adds two targets to the Makefile for quantizing of Quantization Aware Trained (QAT) models to Q4_0 format. The motivation for this is that this sets the token embedding and the output tensors data types to Q8_0 instead of the default Q6_K. This is someting that we wish to enforce for QAT Q4_0 models that are to be uploaded to ggml-org on Huggingface to guarantee the best quality.	2025-08-26 16:12:29 +02:00
Daniel Bevenius	dfd9b5f6c7	model-conversion : set pooling type to none in logits.cpp (#15564 ) This commit explicitly sets the pooling type to 'none' in the logits.cpp to support models that have a pooling type specified. The motivation for this is that some models may have a pooling type set in the model file (.gguf file) and for this specific case where we only want to extract logits, we need to ensure that no pooling is used to so that we are comparing raw logits and not pooled embeddings.	2025-08-25 15:00:43 +02:00
Daniel Bevenius	5a6bc6b1a6	model-conversion : add model card template for embeddings [no ci] (#15557 ) * model-conversion: add model card template for embeddings [no ci] This commit adds a separate model card template (model repository README.md template) for embedding models. The motivation for this is that there server command for the embedding model is a little different and some addition information can be useful in the model card for embedding models which might not be directly relevant for causal models. * squash! model-conversion: add model card template for embeddings [no ci] Fix pyright lint error. * remove --pooling override and clarify embd_normalize usage	2025-08-25 14:25:25 +02:00

1 2 3 4 5 ...

1765 Commits