Ed Addario
b0b33b7ccb
Optimise tensor sampling
2025-08-20 20:58:26 +01:00
Ed Addario
3f0118d602
Fix bias lambda bug
2025-08-20 17:26:37 +01:00
Ed Addario
52da4a4f8c
Skip if output.weight or type is COPY
2025-08-20 17:26:05 +01:00
Ed Addario
43caadf783
Add better fallbacks for IQ mixes
2025-08-20 17:24:48 +01:00
Ed Addario
29b2dc3ec0
Do not mix K and IQ quants
2025-08-20 13:27:01 +01:00
Ed Addario
69586e212e
Add F16/BF16 type
2025-08-20 13:23:11 +01:00
Ed Addario
5cd69a6809
Add F16/BF16 type
2025-08-20 09:41:39 +01:00
Ed Addario
b33abae231
Merge branch 'master' into quantize
2025-08-19 23:39:07 +01:00
Ed Addario
936294f6af
Increase precision for error calculation
2025-08-19 23:31:22 +01:00
Ed Addario
f22b3097eb
Avoid division by zero if truncation occurs
2025-08-19 22:34:01 +01:00
Ed Addario
ee05d6bc0b
Update comments
2025-08-19 22:32:53 +01:00
Ed Addario
5aceb9e3ae
Refactor variable names
2025-08-19 22:29:27 +01:00
lhez
fb22dd07a6
opencl: mark `argsort` unsupported if cols exceed workgroup limit ( #15375 )
2025-08-19 11:25:51 -07:00
Georgi Gerganov
9ef6b0b835
model : add gpt-oss type strings ( #15424 )
2025-08-19 19:58:28 +03:00
Gian-Carlo Pascutto
1e19f5d462
common : Add top-nsigma sampler to help globally ( #15428 )
...
Fixes #15423 .
2025-08-19 19:58:14 +03:00
Georgi Gerganov
d2fcd91cf9
server : disable context shift by default ( #15416 )
...
* server : disable context shift by default
ggml-ci
* server : make scopr of test parameters local
2025-08-19 16:46:37 +03:00
SHUAI YANG
a6d3cfe7fa
CANN: optimize rope operator ( #15335 )
...
* optimize rope ops
* amendment
* delete trailing whitespace
* change the variable name
2025-08-19 21:28:22 +08:00
R0CKSTAR
67f09a3a27
musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 ( #15413 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-19 12:33:47 +02:00
Ed Addario
1187f6aa9e
Implement bpw_overrides call
2025-08-19 11:07:03 +01:00
Ed Addario
92f49ab399
Add target_bpw_type() logic
2025-08-19 11:05:01 +01:00
Ed Addario
017945a3b2
Validate if imatrix contains activations
2025-08-19 11:03:52 +01:00
Ed Addario
9adae08789
Add is_iq()
2025-08-19 11:00:50 +01:00
Ed Addario
c96b8eef94
Add fallback_type enum
2025-08-19 11:00:05 +01:00
Ed Addario
a22a9deeee
Refactor variable and add target_bpw
2025-08-19 10:57:44 +01:00
Ed Addario
1b3d5b5744
Populate params
2025-08-19 10:56:02 +01:00
Ed Addario
e877474458
Process target_bpw parameter
2025-08-19 10:54:02 +01:00
Ed Addario
0edbf0c176
Process activations
2025-08-19 10:51:58 +01:00
Ed Addario
77b818c040
Populate activations_data with imatrix activations if present
2025-08-19 10:50:37 +01:00
Ed Addario
e6d55dc47b
Load activations
2025-08-19 10:49:01 +01:00
Ed Addario
5e85fb3ff3
Add parse_target_bpw()
2025-08-19 10:46:36 +01:00
Ed Addario
cfec4048ab
Update usage
2025-08-19 10:43:51 +01:00
Ed Addario
4d9491141b
Add target_bpw parameter
2025-08-19 10:43:21 +01:00
Marvin Gießing
6424594c56
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware ( #15385 )
...
* Added VSX intrinsics for Power9+ systems
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
* Manual unrolling for minor perf improvement
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
* Update ggml/src/ggml-cpu/arch/powerpc/quants.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-08-19 11:54:31 +03:00
Ed Addario
ba7335efb3
Refactor variable name
2025-08-19 09:54:29 +01:00
Xuan-Son Nguyen
e9288e8869
chat : clarify the meaning of reasoning_format ( #15408 )
...
* chat : clarify the meaning of reasoning_format
* add link to this PR
2025-08-19 10:29:36 +02:00
Georgi Gerganov
9d262f4bad
server : remove swa_full warning ( #15399 )
2025-08-19 08:45:26 +03:00
Georgi Gerganov
f0d3c7405c
batched-bench : use rand tokens ( #15398 )
2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen
f08c4c0d8d
mtmd : clean up clip_n_output_tokens ( #15391 )
2025-08-18 22:53:52 +02:00
Georgi Gerganov
6d7f1117e3
codeowners : remove mmv.*
2025-08-18 22:06:44 +03:00
Georgi Gerganov
60212f1ead
sync : ggml
2025-08-18 22:06:44 +03:00
Georgi Gerganov
f0c541d315
scripts : update sync scripts
2025-08-18 22:06:44 +03:00
Sigbjørn Skjæret
baa9255a45
llama : merge conts and reshapes and remove unnecessary cont ( #15380 )
...
* remove unnecessary conts and merge reshapes
* restore necessary conts
* merge more conts and reshapes
* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00
Georgi Gerganov
3007baf201
readme : update hot topics ( #15397 )
2025-08-18 18:11:44 +03:00
davidef
d1d8241600
server : fix incoming tasks not process in order ( #15395 )
2025-08-18 17:51:42 +03:00
Dobri Danchev
618575c582
Fix broken build: require updated pip to support --break-system-packages ( #15357 )
...
* Revert "devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 (#15005 )"
This reverts commit e4e915912c .
* devops: Allow pip to modify externally-managed python environment (system installation)
- Updated pip install commands to include the --break-system-packages
flag, ensuring compatibility when working with system-managed Python
environments (PEP 668).
- Note: The --break-system-packages option was introduced in 2023.
Ensure pip is updated to a recent version before using this flag.
fixes [#15004 ](https://github.com/danchev/llama.cpp/issues/15004 )
2025-08-18 12:50:48 +02:00
compilade
f44f793172
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors ( #15379 )
...
* ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors
* ggml-quants : avoid division by zero in make_q3_quants
2025-08-18 09:23:56 +02:00
Jeff Bolz
ae532eac2c
vulkan: disable spirv-opt for bfloat16 shaders ( #15352 )
2025-08-18 07:56:29 +02:00
Oleksandr Kuvshynov
e5155e6986
server : export max observed n_past value ( #15361 )
...
Add tracking for high watermark cache usage and make it available in /metrics endpoint.
Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.
2025-08-18 00:28:58 +02:00
Jeff Bolz
21c17b5bef
vulkan: Use larger workgroups for mul_mat_vec when M is small ( #15355 )
...
* vulkan: Use larger workgroups for mul_mat_vec when M is small
Also use subgroup instructions for (part of) the reduction when supported.
Without this, the more expensive reductions would eat into the benefits of
the larger workgroups.
* update heuristic for amd/intel
Co-authored-by: 0cc4m <picard12@live.de>
---------
Co-authored-by: 0cc4m <picard12@live.de>
2025-08-17 18:08:57 +02:00
Dong Won Kim
19f4decae0
vulkan: support sqrt ( #15370 )
2025-08-17 16:03:09 +02:00