llama.cpp

History

Aleksander Grygier 13f2cfad41 Enable per-conversation loading states to allow having parallel conversations (#16327 ) * feat: Per-conversation loading states and tracking streaming stats * chore: update webui build output * refactor: Chat state management Consolidates loading state management by using a global `isLoading` store synchronized with individual conversation states. This change ensures proper reactivity and avoids potential race conditions when updating the UI based on the loading status of different conversations. It also improves the accuracy of statistics displayed. Additionally, slots service methods are updated to use conversation IDs for per-conversation state management, avoiding global state pollution. * feat: Adds loading indicator to conversation items * chore: update webui build output * fix: Fix aborting chat streaming Improves the chat stream abortion process by ensuring that partial responses are saved before the abort signal is sent. This avoids a race condition where the onError callback could clear the streaming state before the partial response is saved. Additionally, the stream reading loop and callbacks are now checked for abort signals to prevent further processing after abortion. * refactor: Remove redundant comments * chore: build webui static output * refactor: Cleanup * chore: update webui build output * chore: update webui build output * fix: Conversation loading indicator for regenerating messages * chore: update webui static build * feat: Improve configuration * feat: Install `http-server` as dev dependency to not need to rely on `npx` in CI		2025-10-20 12:41:13 +02:00
..
batched-bench	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
cvector-generator	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
export-lora	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
gguf-split	ci : use smaller model (#16168 )	2025-09-22 09:11:39 +03:00
imatrix	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
llama-bench	llama : add --no-host to disable host buffers (#16310 )	2025-10-06 19:55:53 +02:00
main	llama-cli: prevent spurious assistant token (#16202 )	2025-09-29 10:03:12 +03:00
mtmd	mtmd : support home-cooked Mistral Small Omni (#14928 )	2025-10-16 19:00:31 +02:00
perplexity	perplexity : show more kl-divergence data (#16321 )	2025-09-29 09:30:45 +03:00
quantize	ci : use smaller model (#16168 )	2025-09-22 09:11:39 +03:00
rpc	rpc : report actual free memory (#16616 )	2025-10-17 18:02:52 +03:00
run	common: introduce http.h for httplib-based client (#16373 )	2025-10-01 20:22:18 +03:00
server	Enable per-conversation loading states to allow having parallel conversations (#16327 )	2025-10-20 12:41:13 +02:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
CMakeLists.txt	mtmd : rename llava directory to mtmd (#13311 )	2025-05-05 16:02:55 +02:00