llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	72f80499ee	server : headers cleanup	2025-11-24 12:50:50 +02:00
Xuan Son Nguyen	625010d42d	move enum stop_type to server-task	2025-11-18 17:28:21 +01:00
Xuan Son Nguyen	ca993bad51	rm redundant includes	2025-11-18 15:01:17 +01:00
Xuan Son Nguyen	3b7946034c	add server-queue	2025-11-18 14:24:46 +01:00
Xuan Son Nguyen	e1a756e934	add server-task, server-common	2025-11-18 14:15:14 +01:00
o7si	97cb3fd5ae	fix: resolve undefined variable 'svr' compilation error (#17348 )	2025-11-18 10:10:47 +02:00
Xuan-Son Nguyen	0de8878c96	server: split HTTP into its own interface (#17216 ) * server: split HTTP into its own interface * move server-http and httplib to its own file * add the remaining endpoints * fix exception/error handling * renaming * missing header * fix missing windows header * fix error responses from http layer * fix slot save/restore handler * fix case where only one stream chunk is returned * add NOMINMAX * do not call sink.write on empty data * use safe_json_to_str for SSE * clean up * add some comments * improve usage of next() * bring back the "server is listening on" message * more generic handler * add req.headers * move the chat template print to init() * add req.path * cont : minor --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-17 22:05:44 +01:00
Georgi Gerganov	5b2093becc	server : handle context overflow during decode (#17267 ) * server : handle context overflow during decode * server : minor refactor	2025-11-16 09:23:37 +02:00
Aleksander Grygier	22e1ce2f81	webui: Fix clickability around chat processing statistics UI (#17278 ) * fix: Better pointer events handling in chat processing info elements * chore: update webui build output	2025-11-15 22:41:41 +01:00
Pascal	1411d9275a	webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618 ) * webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option * webui: remove scroll listener causing unnecessary layout updates (model selector) * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: npm run format & update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-15 21:09:32 +01:00
Xuan-Son Nguyen	9b17d74ab7	mtmd: add mtmd_log_set (#17268 )	2025-11-14 15:56:19 +01:00
Georgi Gerganov	d396b43748	server : fix "can batch with" bug (#17263 )	2025-11-14 14:03:45 +02:00
Aleksander Grygier	f1bad23f88	Better UX for handling multiple attachments in WebUI (#17246 )	2025-11-14 01:19:08 +01:00
Xuan-Son Nguyen	c4abcb2457	server: fixing naming conflict res_error (#17243 )	2025-11-13 20:53:47 +01:00
Aleksander Grygier	8e878f0cb4	Update packages + upgrade Storybook to v10 (#17201 ) * chore: Update packages + upgrade Storybook to v10 * fix: Increase timeout for UI tests	2025-11-12 19:01:48 +01:00
Xuan-Son Nguyen	00c94083b3	server: (refactor) implement generator-based API for task results (#17174 ) * server: (refactor) implement generator-based API for task results * improve * moving some code * fix "Response ended prematurely" * add sink.done before return false * rm redundant check * rm unused var * rename generator --> reader	2025-11-12 18:50:52 +01:00
Xuan-Son Nguyen	ee8dd5c658	server: move res_error/res_ok to static function (#17167 )	2025-11-12 14:17:24 +01:00
Adrien Gallouët	78010a0d52	cmake : move OpenSSL linking to vendor/cpp-httplib (#17177 ) * cmake : move OpenSSL linking to vendor/cpp-httplib Signed-off-by: Adrien Gallouët <angt@huggingface.co> * bring back httplib 0.27.0 * add -DLLAMA_HTTPLIB * update cmake config for visionos --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-12 12:32:50 +01:00
Xuan-Son Nguyen	1d45b4228f	vendor: split httplib to cpp/h files (#17150 ) * vendor: split httplib to cpp/h files * move defines * include httplib if curl is not used * add TODO * fix build ios * fix build visionos instead	2025-11-11 13:32:58 +01:00
Georgi Gerganov	cb1adf8851	server : handle failures to restore host cache (#17078 ) * server : handle failures to restore host cache * server : add tests for the prompt cache	2025-11-09 14:27:05 +02:00
chansikpark	333f2595a3	webui: fix keyboard shortcuts for new chat & edit chat title (#17007 )	2025-11-08 20:52:35 +01:00
Aidan	eeee367de5	server: fix correct time_ms calculation in prompt_progress (#17093 ) * fix: correct time_ms calculation in send_partial_response The time_ms field was incorrectly calculated. The division was happening before the subtraction leading to incorrect values. Before: (ggml_time_us() - slot.t_start_process_prompt / 1000) After: (ggml_time_us() - slot.t_start_process_prompt) / 1000 * docs : document time_ms field in prompt_progress	2025-11-08 15:12:11 +02:00
Georgi Gerganov	16bcc1259d	kv-cache : pad the cache size to 256 for performance (#17046 ) * kv-cache : pad the size of the small SWA cache for performance * context : pad the total context to 256 * cont : future-proof the swa pad * server : adjust test params to new logic	2025-11-07 20:03:25 +02:00
Georgi Gerganov	8c0d6bb455	server : print the samplers chain for each request (#17070 )	2025-11-07 12:24:47 +02:00
Georgi Gerganov	b7f9010d24	server : disable checkpoints with mtmd (#17045 )	2025-11-06 12:09:29 +02:00
Georgi Gerganov	13b339bcd9	server : do not default to multiple slots with speculative decoding (#17017 ) * server : do not default to multiple slots with speculative decoding * cont : fix	2025-11-05 14:32:55 +02:00
손희준	fd2f84f468	docs: Clarify the endpoint that webui uses (#17001 )	2025-11-05 11:20:28 +01:00
Georgi Gerganov	66d8eccd42	server : do context shift only while generating (#17000 )	2025-11-04 19:21:36 +02:00
Aleksander Grygier	e7da30b584	fix: Viewing multiple PDF attachments (#16974 )	2025-11-03 18:53:26 +01:00
Georgi Gerganov	48bd26501b	server : add props.model_alias (#16943 ) * server : add props.model_alias * webui : npm run format	2025-11-03 14:38:23 +01:00
Xuan-Son Nguyen	070ff4d535	mtmd: add --image-min/max-tokens (#16921 )	2025-11-03 11:11:18 +01:00
Sascha Rogmann	bcfa87622a	feat(webui): improve LaTeX rendering with currency detection (#16508 ) * webui : Revised LaTeX formula recognition * webui : Further examples containg amounts * webui : vitest for maskInlineLaTeX * webui: Moved preprocessLaTeX to lib/utils * webui: LaTeX in table-cells * chore: update webui build output (use theirs) * webui: backslash in LaTeX-preprocessing * chore: update webui build output * webui: look-behind backslash-check * chore: update webui build output * Apply suggestions from code review Code maintenance (variable names, code formatting, string handling) Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: Moved constants to lib/constants. * webui: package woff2 inside base64 data * webui: LaTeX-line-break in display formula * chore: update webui build output * webui: Bugfix (font embedding) * webui: Bugfix (font embedding) * webui: vite embeds assets * webui: don't suppress 404 (fonts) * refactor: KaTeX integration with SCSS Moves KaTeX styling to SCSS for better customization and font embedding. This change includes: - Adding `sass` as a dev dependency. - Introducing a custom SCSS file to override KaTeX variables and disable TTF/WOFF fonts, relying solely on WOFF2 for embedding. - Adjusting the Vite configuration to resolve `katex-fonts` alias and inject SCSS variables. * fix: LaTeX processing within blockquotes * webui: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-03 00:41:08 +01:00
Georgi Gerganov	2f966b8ed8	clip : use FA (#16837 ) * clip : use FA * cont : add warning about unsupported ops * implement "auto" mode for clip flash attn * clip : print more detailed op support info during warmup * cont : remove obsolete comment [no ci] * improve debugging message * trailing space * metal : remove stray return --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-02 21:21:48 +01:00
Georgi Gerganov	cd5e3b5754	server : support unified cache across slots (#16736 ) * server : support unified context across slots * cont : fix speculative decoding initialization * context : fix n_ctx_per_seq computation * server : purge slots one by one * tests : add unified cache server tests * llama : update per-seq context computation * test-thread-safety : handle tiny training context of the input model * server : fix server_tokens clear() * server : use 4 slots + unified KV by default * llama : add note about context size queries * cont : update todos [no ci] * context : do not cap the size of the context * tests : adjust parameters to be CI friendlier * context : add warning	2025-11-02 18:14:04 +02:00
Pascal	2f68ce7cfd	webui: auto-refresh /props on inference start to resync model metadata (#16784 ) * webui: auto-refresh /props on inference start to resync model metadata - Add no-cache headers to /props and /slots - Throttle slot checks to 30s - Prevent concurrent fetches with promise guard - Trigger refresh from chat streaming for legacy and ModelSelector - Show dynamic serverWarning when using cached data * fix: restore proper legacy behavior in webui by using unified /props refresh Updated assistant message bubbles to show each message's stored model when available, falling back to the current server model only when the per-message value is missing When the model selector is disabled, now fetches /props and prioritizes that model name over chunk metadata, then persists it with the streamed message so legacy mode properly reflects the backend configuration * fix: detect first valid SSE chunk and refresh server props once * fix: removed the slots availability throttle constant and state * webui: purge ai-generated cruft * chore: update webui static build	2025-11-01 19:49:51 +01:00
Pascal	e4a71599e5	webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe (#16757 ) * webui: add HTML/JS preview support to MarkdownContent with sandboxed iframe dialog Extended MarkdownContent to flag previewable code languages, add a preview button alongside copy controls, manage preview dialog state, and share styling for the new button group Introduced CodePreviewDialog.svelte, a sandboxed iframe modal for rendering HTML/JS previews with consistent dialog controls * webui: fullscreen HTML preview dialog using bits-ui * Update tools/server/webui/src/lib/components/app/misc/CodePreviewDialog.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/misc/MarkdownContent.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: pedantic style tweak for CodePreviewDialog close button * webui: remove overengineered preview language logic * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-01 17:14:54 +01:00
Aleksander Grygier	d8b860a219	Add a setting to display message generation statistics (#16901 ) * feat: Add setting to display message generation statistics * chore: build static webui output	2025-11-01 15:35:57 +01:00
Jaromír Hradílek	1ae74882f8	webui: recognize AsciiDoc files as valid text files (#16850 ) * webui: recognize AsciiDoc files as valid text files * webui: add an updated static webui build * webui: add the updated dependency list * webui: re-add an updated static webui build This also reverts commit `742dbb8379`.	2025-11-01 15:02:57 +01:00
Georgi Gerganov	c22473b580	server : don't print user inputs to console (#16871 )	2025-10-31 10:54:19 +02:00
Daniel Bevenius	0f715b4e75	server : fix typos in server.cpp comments [no ci] (#16883 )	2025-10-31 09:51:26 +01:00
chansikpark	16724b5b68	server : bump request URI max length to 32768 (#16862 )	2025-10-30 20:22:23 +02:00
Georgi Gerganov	b52edd2558	server : remove n_past (#16818 ) * server : remove n_past * server : replace slot.n_prompt_tokens() with slot.task->n_tokens() * server : fixes + clean-up * cont : fix context shift * server : add server_tokens::pos_next() Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> * server : fix pos_next() usage Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co>	2025-10-30 18:42:57 +02:00
Georgi Gerganov	85a7d8677b	memory : remove KV cache size padding (#16812 ) * memory : remove KV cache size padding * cont : restore padding for n_kv tensor shape * server : use slot context size instead of training context size * server : simplify context limit logic	2025-10-28 20:19:44 +02:00
Aldehir Rojas	280d97be96	grammar : support array references in json schema (#16792 ) * grammar : support array references in json schema * Update json-schema-to-grammar.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * grammar : improve regex when naming ref derived rules * grammar : replace non-conformant definitions array with anyOf test case --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-10-28 09:37:52 +01:00
Florian Badie	69e9ff0103	webui: support q URL parameter (#16728 ) * webui: support q URL parameter Fixes #16722 I’ve checked that it works with Firefox’s AI tools * webui: apply suggestions from code review Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-10-24 14:10:29 +02:00
Johannes Gäßler	0bf47a1dbb	server: add memory breakdown print (#16740 )	2025-10-23 21:30:17 +02:00
matteo	8cf6b42d46	server : send partial stop string when <EOG> is reached (#15007 )	2025-10-23 12:32:24 +03:00
Pascal	9b9201f65a	webui: introduce OpenAI-compatible model selector in JSON payload (#16562 ) * webui: introduce OpenAI-compatible model selector in JSON payload * webui: restore OpenAI-Compatible model source of truth and unify metadata capture This change re-establishes a single, reliable source of truth for the active model: fully aligned with the OpenAI-Compat API behavior It introduces a unified metadata flow that captures the model field from both streaming and non-streaming responses, wiring a new onModel callback through ChatService The model name is now resolved directly from the API payload rather than relying on server /props or UI assumptions ChatStore records and persists the resolved model for each assistant message during streaming, ensuring consistency across the UI and database Type definitions for API and settings were also extended to include model metadata and the onModel callback, completing the alignment with OpenAI-Compat semantics * webui: address review feedback from allozaur * webui: move model selector into ChatForm (idea by @allozaur) * webui: make model selector more subtle and integrated into ChatForm * webui: replaced the Flowbite selector with a native Svelte dropdown * webui: add developer setting to toggle the chat model selector * webui: address review feedback from allozaur Normalized streamed model names during chat updates by trimming input and removing directory components before saving or persisting them, so the conversation UI shows only the filename Forced model names within the chat form selector dropdown to render as a single-line, truncated entry with a tooltip revealing the full name * webui: toggle displayed model source for legacy vs OpenAI-Compat modes When the selector is disabled, it falls back to the active server model name from /props When the model selector is enabled, the displayed model comes from the message metadata (the one explicitly selected and sent in the request) * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormActions.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/constants/localstorage-keys.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/services/chat.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/services/chat.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: refactor model selector and persistence helpers - Replace inline portal and event listeners with proper Svelte bindings - Introduce 'persisted' store helper for localStorage sync without runes - Extract 'normalizeModelName' utils + Vitest coverage - Simplify ChatFormModelSelector structure and cleanup logic Replaced the persisted store helper's use of '$state/$effect' runes with a plain TS implementation to prevent orphaned effect runtime errors outside component context Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: document normalizeModelName usage with inline examples * Update tools/server/webui/src/lib/components/app/chat/ChatForm/ChatFormModelSelector.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/stores/models.svelte.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/stores/models.svelte.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: extract ModelOption type into dedicated models.d.ts Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * webui: refine ChatMessageAssistant displayedModel source logic * webui: stabilize dropdown, simplify model extraction, and init assistant model field * chore: update webui static build * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: npm format, update webui static build * webui: align sidebar trigger position, remove z-index glitch * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-10-22 16:58:23 +02:00
Aleksander Grygier	c9c1972e2c	Handle legacy 'context' attachments (#16687 )	2025-10-20 19:49:02 +02:00
Aleksander Grygier	79068501fa	Prevent premature submission on IME input (#16673 ) * fix: Prevent premature submission on IME input * chore: update webui static build * refactor: Put IME completion checker in a helper function and add checking for `KeyboardEvent.eventKey === 229` * chore: update webui static build * chore: update webui static build * chore: update webui static build	2025-10-20 14:21:12 +02:00

1 2 3 4 5

216 Commits