llama.cpp

History

Gaurav Garg a83c73a18a [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042 ) * [CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full With pipeline parallelism, during prompt processing, the CPU-side CUDA command buffer gets full, stalling the CPU. Due to this, enough work doesn't get submitted to the GPU, causing bubbles in the GPU timeline. Fix this by setting the CUDA environment variable CUDA_SCALE_LAUNCH_QUEUES to 4x to increase the command buffer size. * Set the env variable in the CUDA backend registry allocation * Add link to PR in code comment * Remove warning logs and update documentation		2026-01-27 08:52:44 +02:00
..
android	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
backend	docs: add linux to index (#18907 )	2026-01-18 18:03:35 +08:00
development	docs : fix links in parsing.md (#18245 )	2025-12-21 09:35:40 +01:00
multimodal	model : support MiniCPM-V 4.5 (#15575 )	2025-08-26 10:05:55 +02:00
ops	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
android.md	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
build-riscv64-spacemit.md	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
build-s390x.md	ggml-zdnn: fix #15414 , activate FP16 and BF16 acceleration and incorrect zTensor free (#15839 )	2025-09-13 02:39:52 +08:00
build.md	[CUDA] Reduce CPU-side stalls due to the CUDA command buffer being full (#19042 )	2026-01-27 08:52:44 +02:00
docker.md	CLI: fixed adding cli and completion into docker containers, improved docs (#18003 )	2025-12-16 11:52:23 +01:00
function-calling.md	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00
install.md	docs : add "Quick start" section for new users (#13862 )	2025-06-03 13:09:36 +02:00
llguidance.md	llguidance build fixes for Windows (#11664 )	2025-02-14 12:46:08 -08:00
multimodal.md	mtmd : add support for Voxtral (#14862 )	2025-07-28 15:01:48 +02:00
ops.md	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
preset.md	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00