llama.cpp

History

Pascal 7953c18967 webui: fix UI freeze at high token rates with RAF yield The markdown coalescing loop was processing chunks back-to-back without yielding to the browser's paint cycle. At high token rates (250+ tok/s), this caused complete UI freeze as the main thread was perpetually busy. Add a requestAnimationFrame yield between processing batches. This allows the browser to paint at screen FPS regardless of token throughput. Chunks arriving during the yield are coalesced and processed together, so we skip intermediate states and jump straight to the latest content. Before: Chunk->process->Chunk->process->... (browser never paints = freeze) After: Chunk->process->[RAF]->coalesced chunks->process->[RAF]->... (screen FPS) Tested with 250 tok/s streams on 50K+ token contexts: smooth scrolling and responsive UI throughout.		2026-02-01 20:34:08 +01:00
..
batched-bench	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
cli	common : use two decimal places for float arg help messages (#19048 )	2026-01-25 07:31:42 +01:00
completion	completion : fix prompt cache for recurrent models (#19045 )	2026-01-25 09:12:50 +02:00
cvector-generator	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
export-lora	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
fit-params	llama-fit-params: keep explicit --ctx-size 0 (#19070 )	2026-01-24 22:13:08 +01:00
gguf-split	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
imatrix	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
llama-bench	Setting mmap and direct_io to false as default in llama-bench.cpp (#18841 )	2026-01-16 09:46:51 +01:00
mtmd	mtmd : update docs to use llama_model_n_embd_inp (#18999 )	2026-01-22 14:36:32 +01:00
perplexity	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
quantize	quantize: prevent input/output file collision (#18451 )	2025-12-31 23:29:03 +08:00
rpc	Install rpc-server when GGML_RPC is ON. (#17149 )	2025-11-11 10:53:59 +00:00
server	webui: fix UI freeze at high token rates with RAF yield	2026-02-01 20:34:08 +01:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
CMakeLists.txt	cmake: only build cli when server is enabled (#18670 )	2026-01-09 16:43:26 +01:00