Ed Addario
|
5b6f1e9fde
|
General code refactor
|
2025-08-21 19:18:54 +01:00 |
Ed Addario
|
9e11f82e8f
|
Precompute error denominator in estimate_erro()
|
2025-08-21 16:25:31 +01:00 |
Ed Addario
|
887490c5ec
|
Dequantise sampled rows only
|
2025-08-21 15:11:49 +01:00 |
Ed Addario
|
e01dad886b
|
Parallelise candidate evaluation
|
2025-08-21 12:47:13 +01:00 |
Ed Addario
|
95b2ab2800
|
Change error estimate to use normalised weighted MSE
|
2025-08-21 10:46:37 +01:00 |
Ed Addario
|
5ef493ea1a
|
Exclude embeddings and output tensor
|
2025-08-21 09:48:29 +01:00 |
Ed Addario
|
35ad0fc4ad
|
Improve error estimation using weighted MSE
|
2025-08-20 23:27:20 +01:00 |
Ed Addario
|
b0b33b7ccb
|
Optimise tensor sampling
|
2025-08-20 20:58:26 +01:00 |
Ed Addario
|
3f0118d602
|
Fix bias lambda bug
|
2025-08-20 17:26:37 +01:00 |
Ed Addario
|
52da4a4f8c
|
Skip if output.weight or type is COPY
|
2025-08-20 17:26:05 +01:00 |
Ed Addario
|
43caadf783
|
Add better fallbacks for IQ mixes
|
2025-08-20 17:24:48 +01:00 |
Ed Addario
|
29b2dc3ec0
|
Do not mix K and IQ quants
|
2025-08-20 13:27:01 +01:00 |
Ed Addario
|
69586e212e
|
Add F16/BF16 type
|
2025-08-20 13:23:11 +01:00 |
Ed Addario
|
5cd69a6809
|
Add F16/BF16 type
|
2025-08-20 09:41:39 +01:00 |
Ed Addario
|
b33abae231
|
Merge branch 'master' into quantize
|
2025-08-19 23:39:07 +01:00 |
Ed Addario
|
936294f6af
|
Increase precision for error calculation
|
2025-08-19 23:31:22 +01:00 |
Ed Addario
|
f22b3097eb
|
Avoid division by zero if truncation occurs
|
2025-08-19 22:34:01 +01:00 |
Ed Addario
|
ee05d6bc0b
|
Update comments
|
2025-08-19 22:32:53 +01:00 |
Ed Addario
|
5aceb9e3ae
|
Refactor variable names
|
2025-08-19 22:29:27 +01:00 |
lhez
|
fb22dd07a6
|
opencl: mark `argsort` unsupported if cols exceed workgroup limit (#15375)
|
2025-08-19 11:25:51 -07:00 |
Georgi Gerganov
|
9ef6b0b835
|
model : add gpt-oss type strings (#15424)
|
2025-08-19 19:58:28 +03:00 |
Gian-Carlo Pascutto
|
1e19f5d462
|
common : Add top-nsigma sampler to help globally (#15428)
Fixes #15423.
|
2025-08-19 19:58:14 +03:00 |
Georgi Gerganov
|
d2fcd91cf9
|
server : disable context shift by default (#15416)
* server : disable context shift by default
ggml-ci
* server : make scopr of test parameters local
|
2025-08-19 16:46:37 +03:00 |
SHUAI YANG
|
a6d3cfe7fa
|
CANN: optimize rope operator (#15335)
* optimize rope ops
* amendment
* delete trailing whitespace
* change the variable name
|
2025-08-19 21:28:22 +08:00 |
R0CKSTAR
|
67f09a3a27
|
musa: handle __hgt2_mask, available starting from MUSA SDK rc4.3.0 (#15413)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
|
2025-08-19 12:33:47 +02:00 |
Ed Addario
|
1187f6aa9e
|
Implement bpw_overrides call
|
2025-08-19 11:07:03 +01:00 |
Ed Addario
|
92f49ab399
|
Add target_bpw_type() logic
|
2025-08-19 11:05:01 +01:00 |
Ed Addario
|
017945a3b2
|
Validate if imatrix contains activations
|
2025-08-19 11:03:52 +01:00 |
Ed Addario
|
9adae08789
|
Add is_iq()
|
2025-08-19 11:00:50 +01:00 |
Ed Addario
|
c96b8eef94
|
Add fallback_type enum
|
2025-08-19 11:00:05 +01:00 |
Ed Addario
|
a22a9deeee
|
Refactor variable and add target_bpw
|
2025-08-19 10:57:44 +01:00 |
Ed Addario
|
1b3d5b5744
|
Populate params
|
2025-08-19 10:56:02 +01:00 |
Ed Addario
|
e877474458
|
Process target_bpw parameter
|
2025-08-19 10:54:02 +01:00 |
Ed Addario
|
0edbf0c176
|
Process activations
|
2025-08-19 10:51:58 +01:00 |
Ed Addario
|
77b818c040
|
Populate activations_data with imatrix activations if present
|
2025-08-19 10:50:37 +01:00 |
Ed Addario
|
e6d55dc47b
|
Load activations
|
2025-08-19 10:49:01 +01:00 |
Ed Addario
|
5e85fb3ff3
|
Add parse_target_bpw()
|
2025-08-19 10:46:36 +01:00 |
Ed Addario
|
cfec4048ab
|
Update usage
|
2025-08-19 10:43:51 +01:00 |
Ed Addario
|
4d9491141b
|
Add target_bpw parameter
|
2025-08-19 10:43:21 +01:00 |
Marvin Gießing
|
6424594c56
|
ggml-cpu: add mxfp4 VSX intrinsics for Power9+ (ppc64le) hardware (#15385)
* Added VSX intrinsics for Power9+ systems
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
* Manual unrolling for minor perf improvement
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
* Update ggml/src/ggml-cpu/arch/powerpc/quants.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Signed-off-by: mgiessing <marvin.giessing@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
|
2025-08-19 11:54:31 +03:00 |
Ed Addario
|
ba7335efb3
|
Refactor variable name
|
2025-08-19 09:54:29 +01:00 |
Xuan-Son Nguyen
|
e9288e8869
|
chat : clarify the meaning of reasoning_format (#15408)
* chat : clarify the meaning of reasoning_format
* add link to this PR
|
2025-08-19 10:29:36 +02:00 |
Georgi Gerganov
|
9d262f4bad
|
server : remove swa_full warning (#15399)
|
2025-08-19 08:45:26 +03:00 |
Georgi Gerganov
|
f0d3c7405c
|
batched-bench : use rand tokens (#15398)
|
2025-08-19 08:45:12 +03:00 |
Xuan-Son Nguyen
|
f08c4c0d8d
|
mtmd : clean up clip_n_output_tokens (#15391)
|
2025-08-18 22:53:52 +02:00 |
Georgi Gerganov
|
6d7f1117e3
|
codeowners : remove mmv.*
|
2025-08-18 22:06:44 +03:00 |
Georgi Gerganov
|
60212f1ead
|
sync : ggml
|
2025-08-18 22:06:44 +03:00 |
Georgi Gerganov
|
f0c541d315
|
scripts : update sync scripts
|
2025-08-18 22:06:44 +03:00 |
Sigbjørn Skjæret
|
baa9255a45
|
llama : merge conts and reshapes and remove unnecessary cont (#15380)
* remove unnecessary conts and merge reshapes
* restore necessary conts
* merge more conts and reshapes
* merge even more conts and reshapes
|
2025-08-18 19:30:17 +02:00 |
Georgi Gerganov
|
3007baf201
|
readme : update hot topics (#15397)
|
2025-08-18 18:11:44 +03:00 |