HappyZ

happyz synced commits to refs/pull/21263/head at happyz/llama.cpp from mirror 2026-04-03 07:03:05 -07:00

5c59f3979d Use default RISE RISC-V Runners

happyz synced commits to refs/pull/21237/head at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00

d24e0ed6db Merge remote-tracking branch 'upstream/master' into allozaur/20677-webui-server-tools

277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920)

384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)

1f34806c44 jinja: coerce input for string-specific filters (#21370)

c374e3e286 feat: UI improvements

Compare 56 commits »

happyz synced commits to refs/pull/21237/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00

8f8cb1609d Merge d24e0ed6db into 277ff5fff7

d24e0ed6db Merge remote-tracking branch 'upstream/master' into allozaur/20677-webui-server-tools

277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920)

384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)

1f34806c44 jinja: coerce input for string-specific filters (#21370)

Compare 20 commits »

happyz synced commits to refs/pull/21244/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00

be6d8d65e8 Merge 22f18a4838 into 277ff5fff7

277ff5fff7 docker : bump cuda12 to 12.9.1 (#20920)

384c0076bc docs: Update build.md: HSA_OVERRIDE_GFX_VERSION clarification (#21331)

1f34806c44 jinja: coerce input for string-specific filters (#21370)

887535c33f ci: add more binary checks (#21349)

Compare 18 commits »

happyz synced commits to refs/pull/21245/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:04 -07:00

5eb3f005d6 Merge 75d759d5f8 into f49e917876

f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)

7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)

5208e2d5ba fix: gemma 4 template (#21326)

7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)

Compare 8 commits »

happyz synced commits to refs/pull/21230/head at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00

620b2c05d1 Update common/chat-auto-parser-generator.cpp

ed9aa13513 Rename

22248e01af Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers

43a4ee4a2c HIP: build eatch ci build test for a different architecture (#21337)

f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)

Compare 49 commits »

happyz synced commits to refs/pull/21230/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00

3d0999bf70 Merge 620b2c05d1 into d3416a4aa9

620b2c05d1 Update common/chat-auto-parser-generator.cpp

d3416a4aa9 fix: remove stale assert (#21369)

ed9aa13513 Rename

22248e01af Fix call ID detection (Mistral parser mostly) + atomicity for tag-json parsers

Compare 24 commits »

happyz synced commits to refs/pull/21231/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00

8fd211ca47 Merge 1d4a5f9380 into 0c58ba3365

1d4a5f9380 fix model count exceeded check

7666cacf28 move llama_context_device_memory function to llama-ext.h

7e10ec8ff2 add server memory debug logging

4af1a283a6 use memory margin instead of total size limit, apply to each device separately

Compare 20 commits »

happyz synced commits to refs/pull/21231/head at happyz/llama.cpp from mirror 2026-04-03 07:03:03 -07:00

1d4a5f9380 fix model count exceeded check

7666cacf28 move llama_context_device_memory function to llama-ext.h

7e10ec8ff2 add server memory debug logging

4af1a283a6 use memory margin instead of total size limit, apply to each device separately

d2892543f4 only set model memory_mb if not previously calculated

Compare 77 commits »

happyz synced commits to refs/pull/21219/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:02 -07:00

c2394faa95 Merge 0474a433b3 into f49e917876

f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)

7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)

5208e2d5ba fix: gemma 4 template (#21326)

7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)

Compare 8 commits »

happyz synced commits to refs/pull/21221/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:02 -07:00

34140b8b24 Merge ce447e2745 into f851fa5ab0

f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)

f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)

b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)

0c58ba3365 rpc : reuse compute graph buffers (#21299)

Compare 10 commits »

happyz synced commits to refs/pull/21204/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:01 -07:00

465ae2789e Merge cb15cdb020 into 0c58ba3365

0c58ba3365 rpc : reuse compute graph buffers (#21299)

57ace0d612 chat : avoid including json in chat.h (#21306)

39b27f0da0 (revert) kv-cache : do not quantize SWA KV cache (#21332)

f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)

Compare 12 commits »

happyz synced commits to refs/pull/21216/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:01 -07:00

bd8a79b281 Merge c3430c4e34 into f851fa5ab0

f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)

f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)

b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)

0c58ba3365 rpc : reuse compute graph buffers (#21299)

Compare 10 commits »

happyz synced commits to refs/pull/21201/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:00 -07:00

f9d6cd646d Merge 9f7ce433aa into 43a4ee4a2c

43a4ee4a2c HIP: build eatch ci build test for a different architecture (#21337)

f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)

f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)

b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)

Compare 11 commits »

happyz synced commits to refs/pull/21203/merge at happyz/llama.cpp from mirror 2026-04-03 07:03:00 -07:00

47449415b7 Merge d89f8dd0ea into f851fa5ab0

f851fa5ab0 fix: add openssl to nix dependencies (#21353) (#21355)

f1ac84119c ggml-zendnn : add MUL_MAT_ID op support for MoE models (#21315)

b069b10ab4 vocab: fix Gemma4 tokenizer (#21343)

0c58ba3365 rpc : reuse compute graph buffers (#21299)

Compare 10 commits »

happyz synced commits to refs/pull/21187/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:59 -07:00

9cc73489aa Merge 08e16816b4 into f49e917876

f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)

7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)

5208e2d5ba fix: gemma 4 template (#21326)

7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)

Compare 9 commits »

happyz synced commits to refs/pull/21174/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:58 -07:00

45d1caaf54 Merge 72291353f0 into 57ace0d612

72291353f0 server: fix reasoning item content format handling for multi-turn

d8047a21dd ci: retrigger after transient infrastructure failures

6106cf8d90 server: fix streaming event bugs and tighten test assertions

4e05f34e27 server: add streaming compliance tests for Responses API

Compare 16 commits »

happyz synced commits to refs/pull/21174/head at happyz/llama.cpp from mirror 2026-04-03 07:02:57 -07:00

72291353f0 server: fix reasoning item content format handling for multi-turn

d8047a21dd ci: retrigger after transient infrastructure failures

6106cf8d90 server: fix streaming event bugs and tighten test assertions

4e05f34e27 server: add streaming compliance tests for Responses API

a19c7a30ad server: add full streaming compliance for Responses API events

Compare 72 commits »

happyz synced commits to refs/pull/21170/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:55 -07:00

fea06b7d76 Merge 5d9f64c54e into 57ace0d612

57ace0d612 chat : avoid including json in chat.h (#21306)

39b27f0da0 (revert) kv-cache : do not quantize SWA KV cache (#21332)

f49e917876 ci : add AMD ZenDNN label to PR labeler (#21345)

7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)

Compare 8 commits »

happyz synced commits to refs/pull/21168/merge at happyz/llama.cpp from mirror 2026-04-03 07:02:54 -07:00

285465f9c8 Merge fbc4cfcdde into 7c7d6ce5c7

7c7d6ce5c7 [HIP] Bump ROCm version to 7.2.1 (#21066)

5208e2d5ba fix: gemma 4 template (#21326)

7992aa7c8e tests : add unit test coverage for llama_tensor_get_type (#20112)

a1cfb64530 ggml-webgpu: add vectorized flash attention (#20709)

Compare 7 commits »