Commit Graph

6236 Commits

Author SHA1 Message Date
Ed Addario b0b33b7ccb
Optimise tensor sampling 2025-08-20 20:58:26 +01:00
Ed Addario 3f0118d602
Fix bias lambda bug 2025-08-20 17:26:37 +01:00
Ed Addario 52da4a4f8c
Skip if output.weight or type is COPY 2025-08-20 17:26:05 +01:00
Ed Addario 43caadf783
Add better fallbacks for IQ mixes 2025-08-20 17:24:48 +01:00
Ed Addario 29b2dc3ec0
Do not mix K and IQ quants 2025-08-20 13:27:01 +01:00
Ed Addario 69586e212e
Add F16/BF16 type 2025-08-20 13:23:11 +01:00
Ed Addario 5cd69a6809
Add F16/BF16 type 2025-08-20 09:41:39 +01:00
Ed Addario b33abae231
Merge branch 'master' into quantize 2025-08-19 23:39:07 +01:00
Ed Addario 936294f6af
Increase precision for error calculation 2025-08-19 23:31:22 +01:00
Ed Addario f22b3097eb
Avoid division by zero if truncation occurs 2025-08-19 22:34:01 +01:00
Ed Addario ee05d6bc0b
Update comments 2025-08-19 22:32:53 +01:00
Ed Addario 5aceb9e3ae
Refactor variable names 2025-08-19 22:29:27 +01:00
lhez fb22dd07a6
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#15375) 2025-08-19 11:25:51 -07:00
Georgi Gerganov 9ef6b0b835
model : add gpt-oss type strings (#15424) 2025-08-19 19:58:28 +03:00
Gian-Carlo Pascutto 1e19f5d462
common : Add top-nsigma sampler to help globally (#15428)
Fixes #15423.
2025-08-19 19:58:14 +03:00
Georgi Gerganov d2fcd91cf9
server : disable context shift by default (#15416)
* server : disable context shift by default

ggml-ci

* server : make scopr of test parameters local
2025-08-19 16:46:37 +03:00
SHUAI YANG a6d3cfe7fa
CANN: optimize rope operator (#15335)
* optimize rope ops

* amendment

* delete trailing whitespace

* change the variable name
2025-08-19 21:28:22 +08:00
R0CKSTAR 67f09a3a27
musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (#15413)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-19 12:33:47 +02:00
Ed Addario 1187f6aa9e
Implement bpw_overrides call 2025-08-19 11:07:03 +01:00
Ed Addario 92f49ab399
Add target_bpw_type() logic 2025-08-19 11:05:01 +01:00
Ed Addario 017945a3b2
Validate if imatrix contains activations 2025-08-19 11:03:52 +01:00
Ed Addario 9adae08789
Add is_iq() 2025-08-19 11:00:50 +01:00
Ed Addario c96b8eef94
Add fallback_type enum 2025-08-19 11:00:05 +01:00
Ed Addario a22a9deeee
Refactor variable and add target_bpw 2025-08-19 10:57:44 +01:00
Ed Addario 1b3d5b5744
Populate params 2025-08-19 10:56:02 +01:00
Ed Addario e877474458
Process target_bpw parameter 2025-08-19 10:54:02 +01:00
Ed Addario 0edbf0c176
Process activations 2025-08-19 10:51:58 +01:00
Ed Addario 77b818c040
Populate activations_data with imatrix activations if present 2025-08-19 10:50:37 +01:00
Ed Addario e6d55dc47b
Load activations 2025-08-19 10:49:01 +01:00
Ed Addario 5e85fb3ff3
Add parse_target_bpw() 2025-08-19 10:46:36 +01:00
Ed Addario cfec4048ab
Update usage 2025-08-19 10:43:51 +01:00
Ed Addario 4d9491141b
Add target_bpw parameter 2025-08-19 10:43:21 +01:00
Marvin Gießing 6424594c56
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (#15385)
* Added VSX intrinsics for Power9+ systems

Signed-off-by: mgiessing <marvin.giessing@gmail.com>

* Manual unrolling for minor perf improvement

Signed-off-by: mgiessing <marvin.giessing@gmail.com>

* Update ggml/src/ggml-cpu/arch/powerpc/quants.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

---------

Signed-off-by: mgiessing <marvin.giessing@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-08-19 11:54:31 +03:00
Ed Addario ba7335efb3
Refactor variable name 2025-08-19 09:54:29 +01:00
Xuan-Son Nguyen e9288e8869
chat : clarify the meaning of reasoning_format (#15408)
* chat : clarify the meaning of reasoning_format

* add link to this PR
2025-08-19 10:29:36 +02:00
Georgi Gerganov 9d262f4bad
server : remove swa_full warning (#15399) 2025-08-19 08:45:26 +03:00
Georgi Gerganov f0d3c7405c
batched-bench : use rand tokens (#15398) 2025-08-19 08:45:12 +03:00
Xuan-Son Nguyen f08c4c0d8d
mtmd : clean up clip_n_output_tokens (#15391) 2025-08-18 22:53:52 +02:00
Georgi Gerganov 6d7f1117e3 codeowners : remove mmv.* 2025-08-18 22:06:44 +03:00
Georgi Gerganov 60212f1ead sync : ggml 2025-08-18 22:06:44 +03:00
Georgi Gerganov f0c541d315 scripts : update sync scripts 2025-08-18 22:06:44 +03:00
Sigbjørn Skjæret baa9255a45
llama : merge conts and reshapes and remove unnecessary cont (#15380)
* remove unnecessary conts and merge reshapes

* restore necessary conts

* merge more conts and reshapes

* merge even more conts and reshapes
2025-08-18 19:30:17 +02:00
Georgi Gerganov 3007baf201
readme : update hot topics (#15397) 2025-08-18 18:11:44 +03:00
davidef d1d8241600
server : fix incoming tasks not process in order (#15395) 2025-08-18 17:51:42 +03:00
Dobri Danchev 618575c582
Fix broken build: require updated pip to support --break-system-packages (#15357)
* Revert "devops : fix compile bug when the BASE_CUDA_DEV_CONTAINER is based on Ubuntu 24.04 (#15005)"

This reverts commit e4e915912c.

* devops: Allow pip to modify externally-managed python environment (system installation)

- Updated pip install commands to include the --break-system-packages
  flag, ensuring compatibility when working with system-managed Python
  environments (PEP 668).

- Note: The --break-system-packages option was introduced in 2023.
  Ensure pip is updated to a recent version before using this flag.

fixes [#15004](https://github.com/danchev/llama.cpp/issues/15004)
2025-08-18 12:50:48 +02:00
compilade f44f793172
ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors (#15379)
* ggml-quants : fix make_qp_quants NANs and IQ1 assertion errors

* ggml-quants : avoid division by zero in make_q3_quants
2025-08-18 09:23:56 +02:00
Jeff Bolz ae532eac2c
vulkan: disable spirv-opt for bfloat16 shaders (#15352) 2025-08-18 07:56:29 +02:00
Oleksandr Kuvshynov e5155e6986
server : export max observed n_past value (#15361)
Add tracking for high watermark cache usage and make it available in /metrics endpoint.

Use-case: Tracking largest needed cache usage under realistic workload
to better understand memory requirements and be able to adjust
cache size/quantization for model/cache accordingly.
2025-08-18 00:28:58 +02:00
Jeff Bolz 21c17b5bef
vulkan: Use larger workgroups for mul_mat_vec when M is small (#15355)
* vulkan: Use larger workgroups for mul_mat_vec when M is small

Also use subgroup instructions for (part of) the reduction when supported.
Without this, the more expensive reductions would eat into the benefits of
the larger workgroups.

* update heuristic for amd/intel

Co-authored-by: 0cc4m <picard12@live.de>

---------

Co-authored-by: 0cc4m <picard12@live.de>
2025-08-17 18:08:57 +02:00
Dong Won Kim 19f4decae0
vulkan: support sqrt (#15370) 2025-08-17 16:03:09 +02:00