HappyZ happyz
happyz synced commits to refs/pull/21219/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:45 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/21216/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:45 -07:00
c3430c4e34 cont : fix uninitialized required parameters
d0000c1150 cont : undo arbitrary ordering of optional args
731a3f9c6b cont : revert changes to parsing at the end
aff47d5a3b cont : remove upper limit on optional args
Compare 9 commits »
happyz synced commits to refs/pull/21204/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:44 -07:00
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
Compare 4 commits »
happyz synced commits to refs/pull/21212/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:44 -07:00
6422036fcb sync : ggml
296bc0538b ggml : bump version to 0.9.10 (ggml/1454)
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
Compare 12 commits »
happyz synced commits to refs/pull/21216/head at happyz/llama.cpp from mirror 2026-04-01 07:02:44 -07:00
c3430c4e34 cont : fix uninitialized required parameters
d0000c1150 cont : undo arbitrary ordering of optional args
731a3f9c6b cont : revert changes to parsing at the end
aff47d5a3b cont : remove upper limit on optional args
d4e7f58f79 common/peg-parser : fix parenthesization of wrapped parsers
Compare 25 commits »
happyz synced commits to refs/pull/21174/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/21182/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
Compare 8 commits »
happyz synced commits to refs/pull/21201/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
Compare 4 commits »
happyz synced commits to refs/pull/21203/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
6307ec07d3 common : cleanup logs and modernize the progress bar (#21215)
Compare 5 commits »
happyz synced commits to refs/pull/21168/head at happyz/llama.cpp from mirror 2026-04-01 07:02:42 -07:00
fbc4cfcdde Update ggml/src/ggml-cuda/mmq.cuh
777f5943a4 Update ggml/src/ggml-cuda/mmq.cuh
d3065542f0 Update ggml/src/ggml-cuda/mmq.cuh
Compare 3 commits »
happyz synced commits to refs/pull/21168/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:42 -07:00
fbc4cfcdde Update ggml/src/ggml-cuda/mmq.cuh
777f5943a4 Update ggml/src/ggml-cuda/mmq.cuh
d3065542f0 Update ggml/src/ggml-cuda/mmq.cuh
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
Compare 13 commits »
happyz synced commits to refs/pull/21170/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:42 -07:00
0356e33aaf scripts: add function call test script (#21234)
6422036fcb sync : ggml
296bc0538b ggml : bump version to 0.9.10 (ggml/1454)
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
Compare 13 commits »
happyz synced commits to refs/pull/21165/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:41 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/21160/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:41 -07:00
6422036fcb sync : ggml
296bc0538b ggml : bump version to 0.9.10 (ggml/1454)
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
Compare 12 commits »
happyz synced commits to refs/pull/21161/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:41 -07:00
0356e33aaf scripts: add function call test script (#21234)
6422036fcb sync : ggml
296bc0538b ggml : bump version to 0.9.10 (ggml/1454)
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
Compare 13 commits »
happyz synced commits to refs/pull/21159/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/21141/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00
02d4c32517 common: add two-phase graceful reasoning budget termination
Compare 2 commits »
happyz synced commits to refs/pull/21149/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00
4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)
82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)
825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)
0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)
Compare 8 commits »
happyz synced commits to refs/pull/21152/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »
happyz synced commits to refs/pull/21089/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:39 -07:00
6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)
84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)
e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)
88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)
Compare 10 commits »