HappyZ happyz
happyz synced commits to refs/pull/21263/head at happyz/llama.cpp from mirror 2026-04-03 07:03:05 -07:00
5c59f3979d Use default RISE RISC-V Runners
happyz synced commits to refs/pull/21237/head at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00
d24e0ed6db Merge remote-tracking branch 'upstream/master' into allozaur/20677-webui-server-tools
277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920)
384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)
1f34806c44 jinja: coerce input for string-specific filters (#21370)
c374e3e286 feat: UI improvements
Compare 56 commits »
happyz synced commits to refs/pull/21237/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00
d24e0ed6db Merge remote-tracking branch 'upstream/master' into allozaur/20677-webui-server-tools
277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920)
384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)
1f34806c44 jinja: coerce input for string-specific filters (#21370)
Compare 20 commits »
happyz synced commits to refs/pull/21244/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00
277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920)
384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)
1f34806c44 jinja: coerce input for string-specific filters (#21370)
887535c33f ci: add more binary checks (#21349)
Compare 18 commits »
happyz synced commits to refs/pull/21245/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00
f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)
7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)
5208e2d5ba fix: gemma 4 template (#21326)
7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)
Compare 8 commits »
happyz synced commits to refs/pull/21230/head at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00
620b2c05d1 Update common/chat-auto-parser-generator.cpp
ed9aa13513 Rename
22248e01af Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers
43a4ee4a2c HIP: build eatch ci build test for a different architecture (#21337)
f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)
Compare 49 commits »
happyz synced commits to refs/pull/21230/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00
620b2c05d1 Update common/chat-auto-parser-generator.cpp
d3416a4aa9 fix: remove stale assert (#21369)
ed9aa13513 Rename
22248e01af Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers
Compare 24 commits »
happyz synced commits to refs/pull/21231/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00
1d4a5f9380 fix model count exceeded check
7666cacf28 move llama_context_device_memory function to llama-ext.h
7e10ec8ff2 add server memory debug logging
4af1a283a6 use memory margin instead of total size limit, apply to each device separately
Compare 20 commits »
happyz synced commits to refs/pull/21231/head at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00
1d4a5f9380 fix model count exceeded check
7666cacf28 move llama_context_device_memory function to llama-ext.h
7e10ec8ff2 add server memory debug logging
4af1a283a6 use memory margin instead of total size limit, apply to each device separately
d2892543f4 only set model memory_mb if not previously calculated
Compare 77 commits »
happyz synced commits to refs/pull/21219/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:02 -07:00
f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)
7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)
5208e2d5ba fix: gemma 4 template (#21326)
7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)
Compare 8 commits »
happyz synced commits to refs/pull/21221/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:02 -07:00
f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)
f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)
0c58ba3365 rpc : reuse compute graph buffers (#21299)
Compare 10 commits »
happyz synced commits to refs/pull/21204/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:01 -07:00
0c58ba3365 rpc : reuse compute graph buffers (#21299)
57ace0d612 chat : avoid including json in chat.h (#21306)
39b27f0da0 (revert) kv-cache : do not quantize SWA KV cache (#21332)
f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)
Compare 12 commits »
happyz synced commits to refs/pull/21216/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:01 -07:00
f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)
f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)
0c58ba3365 rpc : reuse compute graph buffers (#21299)
Compare 10 commits »
happyz synced commits to refs/pull/21201/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:00 -07:00
43a4ee4a2c HIP: build eatch ci build test for a different architecture (#21337)
f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)
f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)
Compare 11 commits »
happyz synced commits to refs/pull/21203/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:00 -07:00
f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)
f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)
b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)
0c58ba3365 rpc : reuse compute graph buffers (#21299)
Compare 10 commits »
happyz synced commits to refs/pull/21187/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:59 -07:00
f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)
7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)
5208e2d5ba fix: gemma 4 template (#21326)
7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)
Compare 9 commits »
happyz synced commits to refs/pull/21174/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:58 -07:00
72291353f0 server: fix reasoning item content format handling for multi-turn
d8047a21dd ci: retrigger after transient infrastructure failures
6106cf8d90 server: fix streaming event bugs and tighten test assertions
4e05f34e27 server: add streaming compliance tests for Responses API
Compare 16 commits »
happyz synced commits to refs/pull/21174/head at happyz/llama.cpp from mirror 2026-04-03 07:02:57 -07:00
72291353f0 server: fix reasoning item content format handling for multi-turn
d8047a21dd ci: retrigger after transient infrastructure failures
6106cf8d90 server: fix streaming event bugs and tighten test assertions
4e05f34e27 server: add streaming compliance tests for Responses API
a19c7a30ad server: add full streaming compliance for Responses API events
Compare 72 commits »
happyz synced commits to refs/pull/21170/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:55 -07:00
57ace0d612 chat : avoid including json in chat.h (#21306)
39b27f0da0 (revert) kv-cache : do not quantize SWA KV cache (#21332)
f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)
7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)
Compare 8 commits »
happyz synced commits to refs/pull/21168/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:54 -07:00
7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)
5208e2d5ba fix: gemma 4 template (#21326)
7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)
a1cfb64530 ggml-webgpu: add vectorized flash attention (#20709)
Compare 7 commits »