llama.cpp

Commit Graph

Author	SHA1	Message	Date
Leszek Hanusz	fd3cb9bbdd	Merge branch 'master' into notebook	2026-02-17 01:57:31 +01:00
AesSedai	d612901116	perplexity: add proper batching (#19661 )	2026-02-16 18:44:44 +02:00
Leszek Hanusz	2377b8c81e	Merge branch 'master' into notebook	2026-02-16 02:22:25 +01:00
Adrien Gallouët	9e118b97c4	build : remove LLAMA_HTTPLIB option (#19623 ) This option was introduced as a workaround because cpp-httplib could not build on visionOS. Since it has been fixed and now compiles on all platforms, we can remove it and simplify many things. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-15 15:38:50 +01:00
Anav Prasad	01d8eaa28d	mtmd : Add Nemotron Nano 12B v2 VL support (#19547 ) * nemotron nano v2 vlm support added * simplified code; addressed reviews * pre-downsample position embeddings during GGUF conversion for fixed input size	2026-02-14 14:07:00 +01:00
iMil	badba89320	NetBSD build support (#19589 )	2026-02-14 09:47:01 +01:00
Aleksander Grygier	baa12f3831	webui: Architecture and UI improvements (#19596 )	2026-02-14 09:06:41 +01:00
Sigbjørn Skjæret	b2ecc0cdb4	support --verbose-prompt (#19576 )	2026-02-13 12:49:10 +01:00
Aleksander Grygier	5174d7206f	webui: UI and routing fixes (#19586 ) * chore: update webui build output * chore: update webui build output * fix: Scroll issues in DropdownMenuSearchable * webui: fix redirect to root ignoring base path * fix: Word wrapping * fix: remove obsolete modality UI tests causing CI failures - Remove VisionModality/AudioModality test stories - Remove mockServerProps usage and imports - Simplify Default test (remove dropdown interaction checks) - Simplify FileAttachments test (remove mocks) * feat: Improve formatting performance time --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-02-13 12:31:00 +01:00
Aleksander Grygier	4c61875bf8	webui: Add switcher to Chat Message UI to show raw LLM output (#19571 )	2026-02-12 19:55:51 +01:00
Aleksander Grygier	4d688f9ebb	(webui) FEATURE: Enable adding or injecting System Message into chat (#19556 ) * feat: Enable adding System Prompt per-chat * fix: Save draft message in Chat Form when adding System Prompt from new chat view * fix: Proper system message deletion logic * chore: Formatting * chore: update webui build output	2026-02-12 13:56:08 +01:00
Aleksander Grygier	f486ce9f30	(webui) REFACTOR: UI primitives and polish (#19551 ) * webui: UI primitives and polish (non-MCP) * chore: update webui build output	2026-02-12 12:21:00 +01:00
Aleksander Grygier	38adc7d469	WebUI Architecture Cleanup (#19541 ) * webui: architecture foundation (non-MCP core refactors) * chore: update webui build output	2026-02-12 11:22:27 +01:00
RichardScottOZ	fa16e517a3	server : fix typo in README.md for features list (#19510 ) extra l for full	2026-02-12 08:56:25 +01:00
AesSedai	e463bbdf65	model: Add Kimi-K2.5 support (#19170 ) * Move dequant_model to after the text_config merge Add new kimi-k2.5 keys to mtmd convert Update V_MMPROJ tensor mapping for new mm_projector.proj keys Update V_M_IMP_NORM for new mm_projector.pre_norm key * Fix a couple of oversights * Add image support for Kimi-K2.5 * Revert changes to KimiVLForConditionalGeneration * Fix an assert crash * Fix permute swapping w / h on accident * Kimi-K2.5: Use merged QKV for vision * Kimi-K2.5: pre-convert vision QK to use build_rope_2d * Kimi-K2.5: support non-interleaved rope for vision * Kimi-K2.5: fix min / max pixel * Kimi-K2.5: remove v/o permutes, unnecessary * Kimi-K2.5: update permute name to match * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Kimi-K2.5: replace build_rope_2d ggml_cont with ggml_view_3d pointers --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-02-11 16:47:30 +01:00
Georgi Gerganov	6d95707827	model : fix wavtokenizer embedding notions (#19479 )	2026-02-11 07:52:20 +02:00
JJJYmmm	fc0fe40049	models : support qwen3.5 series (#19468 ) * support qwen3.5 series * remove deepstack for now, and some code clean * code clean * add FULL_ATTENTION_INTERVAL metadata * code clean * reorder v heads for linear attention to avoid expensive interleaved repeat	2026-02-10 18:00:26 +02:00
Daniel Bevenius	66d403c480	tts : fix typos in README.md [no ci] (#19463 )	2026-02-10 07:30:41 +01:00
Leszek Hanusz	8a6843aac1	Fix ApiChatCompletionRequest	2026-02-10 03:14:14 +01:00
Leszek Hanusz	8e125febc9	Don't use ChatService.notifyTimings	2026-02-10 01:54:05 +01:00
Leszek Hanusz	a35e4c4d81	Use a separate callbacks argument for sendCompletion	2026-02-10 01:20:14 +01:00
Leszek Hanusz	8f79f1fccb	Removing non-stream /completion implementation + fix api	2026-02-10 00:39:26 +01:00
Tarek Dakhran	262364e31d	mtmd: Implement tiling for LFM2-VL (#19454 )	2026-02-09 17:30:32 +01:00
손희준	820ebfa6f4	Server: log when converting requests to chat completions format (#19457 ) * Log converting requests * Print as debug instead of info [no ci] --------- Co-authored-by: openingnow <>	2026-02-09 16:22:57 +01:00
Sascha Rogmann	292f6908cd	spec : remove check rate (#19377 ) * spec: remove parameter spec-ngram-check-rate * spec : renamed statistics vars * spec : add n_call_begin, n_call_accept * spec : don't enable key-map-stats	2026-02-09 15:30:50 +02:00
Adrien Gallouët	5fa1c190d9	rpc : update from common.cpp (#19400 ) Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-08 09:06:45 +01:00
Georgi Gerganov	eb449cdfa4	server : improve context checkpoint logic (#19408 )	2026-02-08 09:40:04 +02:00
ddh0	5999b50eb0	llama-quantize : cleanup `--help` output (#19317 ) * cleanup `llama-quantize --help` output some much needed TLC * remove future argument oops, spoiler * cleanup of cleanup	2026-02-08 09:22:38 +02:00
Georgi Gerganov	dfde5993ea	common : add common_speculative_is_compat() (#19270 ) * llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit `d30e59b62a`. * spec : check if the target context is compatible for spec decoding	2026-02-06 16:47:22 +02:00
Leszek Hanusz	a0c5c26fb9	Fix calculation of total tokens after undo/redo	2026-02-05 02:33:39 +01:00
Leszek Hanusz	4659a36ffd	Add 42px min height to the statistics to avoid flickering height problems + remove unused imports	2026-02-04 18:44:22 +01:00
Leszek Hanusz	77dc99cd9a	Remove [DONE] check	2026-02-04 18:11:27 +01:00
Leszek Hanusz	031e426005	Run npm run format	2026-02-04 16:31:44 +01:00
Leszek Hanusz	393faf0166	Put completion api service in separate file	2026-02-04 16:29:53 +01:00
Leszek Hanusz	251ba9d72a	Put tokenize in a separate file	2026-02-04 15:58:54 +01:00
Leszek Hanusz	efd274ab3d	chore: update webui build output	2026-02-04 14:25:20 +01:00
Daniel Bevenius	25f40ca65f	completion : simplify batch (embd) processing (#19286 ) * completion : simplify batch (embd) processing This commit simplifies the processing of embd by removing the for loop that currently exists which uses params.n_batch as its increment. This commit also removes the clamping of n_eval as the size of embd is always at most the size of params.n_batch. The motivation is to clarify the code as it is currently a little confusing when looking at this for loop in isolation and thinking that it can process multiple batches. * add an assert to verify n_eval is not greater than n_batch	2026-02-04 05:43:28 +01:00
Leszek Hanusz	ad3b8df38f	Remove currentConfig.model	2026-02-04 02:03:59 +01:00
Leszek Hanusz	f20b17a087	Remove inputContent var and use tokenize only when needed	2026-02-04 01:23:24 +01:00
Leszek Hanusz	9cf4742adb	Fix tokenize with router on	2026-02-04 00:21:56 +01:00
Leszek Hanusz	03077cf297	Merge branch 'master' into notebook	2026-02-03 03:04:31 +01:00
Leszek Hanusz	210dc6a2c0	Running npm run format	2026-02-03 02:27:10 +01:00
Leszek Hanusz	9dc75f2664	Fix npm run check errors	2026-02-03 02:22:32 +01:00
Leszek Hanusz	f42d889a47	Fix vertical alignment of Generate tooltip shortcut info	2026-02-03 02:14:28 +01:00
Leszek Hanusz	fb2095e815	Show total number of tokens by using tokenizer	2026-02-03 01:50:52 +01:00
Leszek Hanusz	3657a8a7ad	Implement shortcuts for the notebook page	2026-02-02 23:59:36 +01:00
Leszek Hanusz	7892b259cb	Add last undo/redo for notebook page	2026-02-02 22:39:07 +01:00
Leszek Hanusz	f041a864ed	Use same dialog for server errors on notebook page	2026-02-02 21:29:48 +01:00
Leszek Hanusz	11e3cd81ce	Protect window from accidental closure if the notebook is not empty as it is not saved	2026-02-02 21:15:24 +01:00
Xuan-Son Nguyen	07a7412a3b	mtmd: add min/max pixels gguf metadata (#19273 )	2026-02-02 20:59:06 +01:00

1 2 3 4 5 ...

567 Commits