llama.cpp

History

Reese Levine a89002f07b ggml webgpu: support for backend sampling (#18880 ) * ggml webgpu: add SOFTPLUS unary operator Implements SOFTPLUS (log(1 + exp(x))) with f16/f32 support. Uses f32 precision for intermediate calculations to prevent f16 overflow. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * Follow Vulkan backend numerical stability pattern * ggml webgpu: add EXPM1 unary operator Implements EXPM1 (exp(x) - 1) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add FLOOR unary operator Implements FLOOR (rounds down to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add CEIL unary operator Implements CEIL (rounds up to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add ROUND unary operator Implements ROUND (rounds to nearest integer) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * ggml webgpu: add TRUNC unary operator Implements TRUNC (truncates towards zero) with f16/f32 support. * Add shader implementation and 4 variants (f32/f16, inplace/non-inplace) * Register pipelines and device support * docs : update WebGPU support for unary operators (FLOOR, CEIL, ROUND, TRUNC, EXPM1, SOFTPLUS) * Updates to webgpu get_memory * Add argmax * Add argmax,cumsum,sum,sum_rows * Add necessary CPY/GET_ROWS operators * Support for argsort using multi-pass strategy * Update set_rows for i32 indices, move to pre-wgsl * Port unary operators to pre-wgsl and support FILL * Implement PAD * Add support for top-k * clean up, scope pipeline init mutex * fix newline * Add support for log * Update LOG for better precision, and ops doc --------- Co-authored-by: Abhijit Ramesh <abhijitramesh2k@gmail.com>		2026-01-16 16:12:43 -08:00
..
android	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
backend	hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822 )	2026-01-14 21:46:12 -08:00
development	docs : fix links in parsing.md (#18245 )	2025-12-21 09:35:40 +01:00
multimodal	model : support MiniCPM-V 4.5 (#15575 )	2025-08-26 10:05:55 +02:00
ops	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
android.md	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
build-riscv64-spacemit.md	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
build-s390x.md	ggml-zdnn: fix #15414 , activate FP16 and BF16 acceleration and incorrect zTensor free (#15839 )	2025-09-13 02:39:52 +08:00
build.md	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
docker.md	CLI: fixed adding cli and completion into docker containers, improved docs (#18003 )	2025-12-16 11:52:23 +01:00
function-calling.md	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00
install.md	docs : add "Quick start" section for new users (#13862 )	2025-06-03 13:09:36 +02:00
llguidance.md	llguidance build fixes for Windows (#11664 )	2025-02-14 12:46:08 -08:00
multimodal.md	mtmd : add support for Voxtral (#14862 )	2025-07-28 15:01:48 +02:00
ops.md	ggml webgpu: support for backend sampling (#18880 )	2026-01-16 16:12:43 -08:00
preset.md	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00