HappyZ

happyz synced commits to refs/pull/21219/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:45 -07:00

f1b3500bc9 Merge 0474a433b3 into 6b949d1078

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)

88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)

Compare 10 commits »

happyz synced commits to refs/pull/21216/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:45 -07:00

a3adb63ef7 Merge c3430c4e34 into 4951250235

c3430c4e34 cont : fix uninitialized required parameters

d0000c1150 cont : undo arbitrary ordering of optional args

731a3f9c6b cont : revert changes to parsing at the end

aff47d5a3b cont : remove upper limit on optional args

Compare 9 commits »

happyz synced commits to refs/pull/21204/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:44 -07:00

4f0cbd7d83 Merge cb15cdb020 into 4951250235

4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)

82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)

825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)

Compare 4 commits »

happyz synced commits to refs/pull/21212/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:44 -07:00

47835d9c11 Merge 965f7c0268 into 6422036fcb

6422036fcb sync : ggml

296bc0538b ggml : bump version to 0.9.10 (ggml/1454)

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

Compare 12 commits »

happyz synced commits to refs/pull/21216/head at happyz/llama.cpp from mirror 2026-04-01 07:02:44 -07:00

c3430c4e34 cont : fix uninitialized required parameters

d0000c1150 cont : undo arbitrary ordering of optional args

731a3f9c6b cont : revert changes to parsing at the end

aff47d5a3b cont : remove upper limit on optional args

d4e7f58f79 common/peg-parser : fix parenthesization of wrapped parsers

Compare 25 commits »

happyz synced commits to refs/pull/21174/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00

53ecea86b5 Merge adef64cb9f into 6b949d1078

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)

88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)

Compare 10 commits »

happyz synced commits to refs/pull/21182/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00

da688dbef9 Merge 646f0a7d78 into 4951250235

4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)

82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)

825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)

0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)

Compare 8 commits »

happyz synced commits to refs/pull/21201/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00

237b876332 Merge 9f7ce433aa into 4951250235

4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)

82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)

825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)

Compare 4 commits »

happyz synced commits to refs/pull/21203/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:43 -07:00

b268b10134 Merge d89f8dd0ea into 82764c341a

82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)

825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)

0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)

6307ec07d3 common : cleanup logs and modernize the progress bar (#21215)

Compare 5 commits »

happyz synced commits to refs/pull/21168/head at happyz/llama.cpp from mirror 2026-04-01 07:02:42 -07:00

fbc4cfcdde Update ggml/src/ggml-cuda/mmq.cuh

777f5943a4 Update ggml/src/ggml-cuda/mmq.cuh

d3065542f0 Update ggml/src/ggml-cuda/mmq.cuh

Compare 3 commits »

happyz synced commits to refs/pull/21168/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:42 -07:00

fce74f6b96 Merge fbc4cfcdde into 6b949d1078

fbc4cfcdde Update ggml/src/ggml-cuda/mmq.cuh

777f5943a4 Update ggml/src/ggml-cuda/mmq.cuh

d3065542f0 Update ggml/src/ggml-cuda/mmq.cuh

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

Compare 13 commits »

happyz synced commits to refs/pull/21170/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:42 -07:00

5f4610fd99 Merge 5d9f64c54e into 0356e33aaf

0356e33aaf scripts: add function call test script (#21234)

6422036fcb sync : ggml

296bc0538b ggml : bump version to 0.9.10 (ggml/1454)

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

Compare 13 commits »

happyz synced commits to refs/pull/21165/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:41 -07:00

680d43936e Merge c27b0d3d88 into 6b949d1078

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)

88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)

Compare 10 commits »

happyz synced commits to refs/pull/21160/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:41 -07:00

b2698fcea6 Merge 532a8ebdde into 6422036fcb

6422036fcb sync : ggml

296bc0538b ggml : bump version to 0.9.10 (ggml/1454)

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

Compare 12 commits »

happyz synced commits to refs/pull/21161/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:41 -07:00

9844c49bf9 Merge 759db688ad into 0356e33aaf

0356e33aaf scripts: add function call test script (#21234)

6422036fcb sync : ggml

296bc0538b ggml : bump version to 0.9.10 (ggml/1454)

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

Compare 13 commits »

happyz synced commits to refs/pull/21159/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00

4126a9c6db Merge d1fd632ab8 into 6b949d1078

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)

88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)

Compare 10 commits »

happyz synced commits to refs/pull/21141/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00

26cc6f60e6 Merge 02d4c32517 into 825eb91a66

02d4c32517 common: add two-phase graceful reasoning budget termination

Compare 2 commits »

happyz synced commits to refs/pull/21149/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00

cd9691014c Merge 57a8def44e into 4951250235

4951250235 llama : refactor llama_model_quantize_params to expose a pure C interface (#20346)

82764c341a ggml webgpu: quantized buffers to u32 + wider browser/device support (#21046)

825eb91a66 ggml-webgpu: port all AOT operators to JIT (#20728)

0fcb3760b2 fix: Use lower-case proxy headers naming (#21235)

Compare 8 commits »

happyz synced commits to refs/pull/21152/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:40 -07:00

acf1b5d842 Merge 1c128d941e into 6b949d1078

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)

88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)

Compare 10 commits »

happyz synced commits to refs/pull/21089/merge at happyz/llama.cpp from mirror 2026-04-01 07:02:39 -07:00

1cee8b9dc1 Merge 0aae7d78c7 into 6b949d1078

6b949d1078 sycl : support nvfp4 type in mul_mat (#21227)

84f82e846c ggml-cuda: Add generic NVFP4 MMQ kernel (#21074)

e1cb817483 memory: respect unified KV cache in hybrid memory for eval tasks (#21224)

88d5f8ffc3 CUDA/HIP: Fix kernel slection for mmvq mmid kernel to align host selection with device launch bounds (#21238)

Compare 10 commits »