llama.cpp

Commit Graph

Author	SHA1	Message	Date
Leszek Hanusz	fd3cb9bbdd	Merge branch 'master' into notebook	2026-02-17 01:57:31 +01:00
Leszek Hanusz	2377b8c81e	Merge branch 'master' into notebook	2026-02-16 02:22:25 +01:00
Adrien Gallouët	9e118b97c4	build : remove LLAMA_HTTPLIB option (#19623 ) This option was introduced as a workaround because cpp-httplib could not build on visionOS. Since it has been fixed and now compiles on all platforms, we can remove it and simplify many things. Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2026-02-15 15:38:50 +01:00
Aleksander Grygier	baa12f3831	webui: Architecture and UI improvements (#19596 )	2026-02-14 09:06:41 +01:00
Aleksander Grygier	5174d7206f	webui: UI and routing fixes (#19586 ) * chore: update webui build output * chore: update webui build output * fix: Scroll issues in DropdownMenuSearchable * webui: fix redirect to root ignoring base path * fix: Word wrapping * fix: remove obsolete modality UI tests causing CI failures - Remove VisionModality/AudioModality test stories - Remove mockServerProps usage and imports - Simplify Default test (remove dropdown interaction checks) - Simplify FileAttachments test (remove mocks) * feat: Improve formatting performance time --------- Co-authored-by: Pascal <admin@serveurperso.com>	2026-02-13 12:31:00 +01:00
Aleksander Grygier	4c61875bf8	webui: Add switcher to Chat Message UI to show raw LLM output (#19571 )	2026-02-12 19:55:51 +01:00
Aleksander Grygier	4d688f9ebb	(webui) FEATURE: Enable adding or injecting System Message into chat (#19556 ) * feat: Enable adding System Prompt per-chat * fix: Save draft message in Chat Form when adding System Prompt from new chat view * fix: Proper system message deletion logic * chore: Formatting * chore: update webui build output	2026-02-12 13:56:08 +01:00
Aleksander Grygier	f486ce9f30	(webui) REFACTOR: UI primitives and polish (#19551 ) * webui: UI primitives and polish (non-MCP) * chore: update webui build output	2026-02-12 12:21:00 +01:00
Aleksander Grygier	38adc7d469	WebUI Architecture Cleanup (#19541 ) * webui: architecture foundation (non-MCP core refactors) * chore: update webui build output	2026-02-12 11:22:27 +01:00
RichardScottOZ	fa16e517a3	server : fix typo in README.md for features list (#19510 ) extra l for full	2026-02-12 08:56:25 +01:00
Leszek Hanusz	8a6843aac1	Fix ApiChatCompletionRequest	2026-02-10 03:14:14 +01:00
Leszek Hanusz	8e125febc9	Don't use ChatService.notifyTimings	2026-02-10 01:54:05 +01:00
Leszek Hanusz	a35e4c4d81	Use a separate callbacks argument for sendCompletion	2026-02-10 01:20:14 +01:00
Leszek Hanusz	8f79f1fccb	Removing non-stream /completion implementation + fix api	2026-02-10 00:39:26 +01:00
손희준	820ebfa6f4	Server: log when converting requests to chat completions format (#19457 ) * Log converting requests * Print as debug instead of info [no ci] --------- Co-authored-by: openingnow <>	2026-02-09 16:22:57 +01:00
Sascha Rogmann	292f6908cd	spec : remove check rate (#19377 ) * spec: remove parameter spec-ngram-check-rate * spec : renamed statistics vars * spec : add n_call_begin, n_call_accept * spec : don't enable key-map-stats	2026-02-09 15:30:50 +02:00
Georgi Gerganov	eb449cdfa4	server : improve context checkpoint logic (#19408 )	2026-02-08 09:40:04 +02:00
Georgi Gerganov	dfde5993ea	common : add common_speculative_is_compat() (#19270 ) * llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit `d30e59b62a`. * spec : check if the target context is compatible for spec decoding	2026-02-06 16:47:22 +02:00
Leszek Hanusz	a0c5c26fb9	Fix calculation of total tokens after undo/redo	2026-02-05 02:33:39 +01:00
Leszek Hanusz	4659a36ffd	Add 42px min height to the statistics to avoid flickering height problems + remove unused imports	2026-02-04 18:44:22 +01:00
Leszek Hanusz	77dc99cd9a	Remove [DONE] check	2026-02-04 18:11:27 +01:00
Leszek Hanusz	031e426005	Run npm run format	2026-02-04 16:31:44 +01:00
Leszek Hanusz	393faf0166	Put completion api service in separate file	2026-02-04 16:29:53 +01:00
Leszek Hanusz	251ba9d72a	Put tokenize in a separate file	2026-02-04 15:58:54 +01:00
Leszek Hanusz	efd274ab3d	chore: update webui build output	2026-02-04 14:25:20 +01:00
Leszek Hanusz	ad3b8df38f	Remove currentConfig.model	2026-02-04 02:03:59 +01:00
Leszek Hanusz	f20b17a087	Remove inputContent var and use tokenize only when needed	2026-02-04 01:23:24 +01:00
Leszek Hanusz	9cf4742adb	Fix tokenize with router on	2026-02-04 00:21:56 +01:00
Leszek Hanusz	03077cf297	Merge branch 'master' into notebook	2026-02-03 03:04:31 +01:00
Leszek Hanusz	210dc6a2c0	Running npm run format	2026-02-03 02:27:10 +01:00
Leszek Hanusz	9dc75f2664	Fix npm run check errors	2026-02-03 02:22:32 +01:00
Leszek Hanusz	f42d889a47	Fix vertical alignment of Generate tooltip shortcut info	2026-02-03 02:14:28 +01:00
Leszek Hanusz	fb2095e815	Show total number of tokens by using tokenizer	2026-02-03 01:50:52 +01:00
Leszek Hanusz	3657a8a7ad	Implement shortcuts for the notebook page	2026-02-02 23:59:36 +01:00
Leszek Hanusz	7892b259cb	Add last undo/redo for notebook page	2026-02-02 22:39:07 +01:00
Leszek Hanusz	f041a864ed	Use same dialog for server errors on notebook page	2026-02-02 21:29:48 +01:00
Leszek Hanusz	11e3cd81ce	Protect window from accidental closure if the notebook is not empty as it is not saved	2026-02-02 21:15:24 +01:00
Leszek Hanusz	301c3fec7e	Add generation statistics to notebook page	2026-02-02 18:39:46 +01:00
Matthieu Coudron	a3fa035822	server: print actual model name in 'model not found" error (#19117 ) Experimenting with AI, my environment gets messy fast and it's not always easy to know what model my software is trying to load. This helps with troubleshooting. before: Error: { code = 400, message = "model not found", type = "invalid_request_error" } After: Error: { code = 400, message = "model 'toto' not found", type = "invalid_request_error" }	2026-02-02 16:55:27 +01:00
Leszek Hanusz	8a71126e5b	Autoscroll the notebook textarea depending on config parameter	2026-02-02 16:19:53 +01:00
Leszek Hanusz	e80ba11778	Fix sidebar behavior same as chat pages	2026-02-02 15:46:12 +01:00
Leszek Hanusz	ff2f0bba4a	Remove console logs	2026-02-02 15:06:51 +01:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
Leszek Hanusz	c9f9863268	Add .agent/ to gitignore Fix buttons Fix model loading with router enabled remove stats for now lint	2026-02-01 23:20:34 +01:00
Leszek Hanusz	3af9b34aa2	Refine Notebook UI: improved layout, added stats and model info	2026-01-31 23:59:45 +01:00
Leszek Hanusz	6d96745375	Implement Notebook interface	2026-01-31 22:14:28 +01:00
Georgi Gerganov	bbada8bfb9	server : wrap around the "id_slot" parameter (#19207 ) * server : wrap around the "id_slot" parameter * cont : minor	2026-01-30 19:46:10 +02:00
Georgi Gerganov	dabaa2e77a	spec : add ngram-mod (#19164 ) * spec : add ngram-mod * cont : simplify + keep track of occupancy * cont : cleanup * cont : move initialization to common/speculative * cont : cleanup * cont : cleanup * cont : fix	2026-01-30 18:21:48 +02:00
Andrew Marshall	84b0a98319	webui: Update Svelte to fix effect_update_depth_exceeded errors (#19144 ) The upstream fix is first available in 5.38.2, so constrain to at least that version. Rebuild pre-compiled webui index.html.gz based on these changes. See also: https://github.com/ggml-org/llama.cpp/issues/16347 https://github.com/huntabyte/bits-ui/issues/1687 https://github.com/sveltejs/svelte/issues/16548	2026-01-29 15:56:39 +01:00
Sascha Rogmann	72d3b1898a	spec : add self‑speculative decoding (no draft model required) + refactor (#18471 ) * server: introduce self-speculative decoding * server: moved self-call into speculative.cpp * can_speculate() includes self-speculation Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * server: can_speculate() tests self-spec * server: replace can_speculate() with slot.can_speculate() Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * common: use %zu format specifier for size_t in logging Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * server: can_speculate() requires a task instance * common: ngram map, config self-speculative decoding * common: add enum common_speculative_type * common: add vector of speculative states * common: add option --spec-draftless * server: cleanup (remove slot.batch_spec, rename) * common: moved self-spec impl to ngram-map * common: cleanup (use common_speculative_state_draft) * spec : refactor * cont : naming * spec: remove --spec-config * doc: (draftless) speculative decoding * common: print performance in spec decoding * minor : cleanup * common : better names * minor : cleanup + fix build * minor: comments * CODEOWNERS: add common/ngram-map.* (#18471) * common : rename speculative.draftless_type -> speculative.type * ngram-map : fix uninitialized values * ngram-map : take into account the input can become shorter * ngram-map : revert len check for now * arg : change `--spec-draftless` -> `--spec-type` * spec : add common_speculative_state::accept() * spec : refactor + add common_speculative_begin() * spec : fix begin() call with mtmd * spec : additional refactor + remove common_speculative_params --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2026-01-28 19:42:42 +02:00

1 2 3 4 5 ...

374 Commits