HappyZ happyz
happyz synced commits to refs/pull/20394/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:27 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/20454/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:27 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/20456/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:27 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 11 commits »
happyz synced commits to refs/pull/20487/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:27 -07:00
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
6307ec07d3 common : cleanup logs and modernize the progress bar (#21215)
632219af73 CANN: fix multi-thread set_tensor race conditions (#20151)
Compare 35 commits »
happyz synced commits to refs/pull/20275/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:26 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/20269/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:26 -07:00
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
6307ec07d3 common : cleanup logs and modernize the progress bar (#21215)
632219af73 CANN: fix multi-thread set_tensor race conditions (#20151)
Compare 18 commits »
happyz synced commits to refs/pull/20242/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:25 -07:00
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
d43375ff7f ggml : fix RWKV ops thread assignment (#21226)
2b86e5cae6 ggml-cpu: fix fallback for RVV kernels without zvfh (#21157)
88458164c7 CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)
Compare 41 commits »
happyz synced commits to refs/pull/20075/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:24 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/20238/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:24 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/20086/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:24 -07:00
d43375ff7f ggml : fix RWKV ops thread assignment (#21226)
2b86e5cae6 ggml-cpu: fix fallback for RVV kernels without zvfh (#21157)
88458164c7 CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
Compare 7 commits »
happyz synced commits to refs/pull/20112/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:24 -07:00
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
Compare 5 commits »
happyz synced commits to refs/pull/20064/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:23 -07:00
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
6307ec07d3 common : cleanup logs and modernize the progress bar (#21215)
632219af73 CANN: fix multi-thread set_tensor race conditions (#20151)
Compare 35 commits »
happyz synced commits to refs/pull/19855/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:23 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/19938/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:23 -07:00
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
Compare 31 commits »
happyz synced commits to refs/pull/20009/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:23 -07:00
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
6307ec07d3 common : cleanup logs and modernize the progress bar (#21215)
632219af73 CANN: fix multi-thread set_tensor race conditions (#20151)
Compare 14 commits »
happyz synced commits to refs/pull/20062/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:23 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/19671/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:22 -07:00
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
d43375ff7f ggml : fix RWKV ops thread assignment (#21226)
Compare 38 commits »
happyz synced commits to refs/pull/19763/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:22 -07:00
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
6307ec07d3 common : cleanup logs and modernize the progress bar (#21215)
632219af73 CANN: fix multi-thread set_tensor race conditions (#20151)
Compare 23 commits »
happyz synced commits to refs/pull/19743/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:22 -07:00
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
Compare 5 commits »
happyz synced commits to refs/pull/19755/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:22 -07:00
88458164c7 CUDA: Add Flash Attention Support for Head Dimension 512 (#20998)
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
Compare 19 commits »