Ed Addario
aea9b31db5
Make ZD Score two-tailed
2025-08-05 12:57:13 +01:00
Ed Addario
906548a00a
Update aggregated sum of squared activations per layer
2025-08-05 12:06:19 +01:00
Ed Addario
b37393423d
Compute aggregated (per layer) l2 norm
2025-08-05 08:54:57 +01:00
Ed Addario
5e40cf4f1c
Do not resize if in_sum is null
2025-08-05 00:18:53 +01:00
Ed Addario
adbff66394
Merge branch 'master' into imatrix
2025-08-04 22:16:10 +01:00
Ed Addario
c39c4e2a33
Refactor variable name
2025-08-04 22:15:50 +01:00
Jeff Bolz
5aa1105da2
vulkan: fix build when using glslang that does not support coopmat2 ( #15062 )
2025-08-04 07:09:19 +02:00
compilade
d31192b4ee
imatrix : use GGUF by default ( #14842 )
...
* imatrix : use GGUF by default
* imatrix : use GGUF regardless of the output filename
The legacy format can only be produced with --output-format dat
2025-08-03 22:00:05 +02:00
compilade
0a2f5496be
imatrix : fix 3d activation handling for hybrid and recurrent models ( #14994 )
...
* imatrix : use a single count for dense 3d tensors
* imatrix : fix 3d activations when model tensor is 2d
* imatrix : fix 3d tensor counts
2025-08-03 21:49:13 +02:00
compilade
11a3811164
memory : handle kv_unified for hybrid models ( #15050 )
2025-08-03 21:43:07 +02:00
Csaba Kecskemeti
97366dc6ab
vocab : JetBrains Mellum pre-tokenizer ( #15045 )
2025-08-03 21:38:18 +02:00
Ed Addario
f1c2a4ca3f
Fix printing l2 norm when calc_mode = 1
2025-08-03 17:14:46 +01:00
Ed Addario
90cb1be99d
Minor cosmetic changes
2025-08-03 16:57:27 +01:00
Ed Addario
2117c4e54b
Update aggregated statistic report layout
2025-08-03 16:38:02 +01:00
Ed Addario
a6155a8125
Add compute_layer_statistics() function
2025-08-03 16:35:03 +01:00
Gabriel Larson
83bc2f288c
model : add text-only support for Kimi-VL (and find special tokens in text_config) ( #15051 )
...
* basic kimi-vl textmodel conversion
* check config["text_config"] for special tokens
2025-08-03 16:56:25 +02:00
Ed Addario
be60469f25
Refactor function names
2025-08-03 15:10:17 +01:00
Jeff Bolz
6c7a441161
vulkan: Use coopmat2 for conv2d ( #14982 )
2025-08-03 14:23:57 +02:00
Ed Addario
fce05aac9e
Refactor lambda into compute_tensor_averages() function
2025-08-03 13:03:21 +01:00
Ed Addario
5324558132
Update table layout
2025-08-03 10:28:47 +01:00
Ed Addario
4d1325e1eb
Refactor variables
2025-08-03 10:28:23 +01:00
Ed Addario
a32a2ecbed
Reformat report layout
2025-08-03 00:51:33 +01:00
Ed Addario
4c01f51ae1
Remove inactive
2025-08-03 00:51:12 +01:00
lhez
5c0eb5ef54
opencl: fix adreno compiler detection logic ( #15029 )
2025-08-02 19:51:18 +02:00
Ed Addario
fc8f92596f
Update table display
2025-08-02 16:46:27 +01:00
Ed Addario
ee2509f563
Adjust threshold
2025-08-02 16:45:56 +01:00
Ed Addario
9b841eb696
Compute l2 norm
2025-08-02 16:45:09 +01:00
Ed Addario
b7fb362d8e
Compute cosine similarity based on activations
2025-08-02 16:43:49 +01:00
Ed Addario
cce514a392
Compute entropy for activations
2025-08-02 16:40:40 +01:00
Ed Addario
9744a4a1c6
Determine calculation mode
2025-08-02 16:36:12 +01:00
Ed Addario
78ddb475de
Fix problem up when GGUF does not have in_sum
2025-08-02 16:31:21 +01:00
Johannes Gäßler
03d4698218
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 ( #15035 )
2025-08-02 16:37:08 +02:00
leejet
3303c19b16
cuda: make im2col a little faster ( #15025 )
2025-08-02 17:15:36 +03:00
Daniel Bevenius
4fdea540bd
kv-cache : skip alignment of n_stream in kv-cache log msg [no ci] ( #15040 )
...
This commit removes the right alignment the `n_stream` value in the
log message in the `llama_kv_cache_unified` constructor.
The motivation for this change is to enhance the readability of log
message. Currently the output looks like this:
```console
llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/ 1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
```
Notice that the `n_stream` value is right aligned, which makes it a
little harder to read.
With the change in this commit the output will look like
```console
llama_kv_cache_unified: size = 2048.00 MiB ( 4096 cells, 32 layers, 1/1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
```
2025-08-02 17:14:57 +03:00
Georgi Gerganov
a4569c41fd
llama : enable LLAMA_SET_ROWS=1 by default ( #14959 )
...
ggml-ci
2025-08-02 17:14:21 +03:00
Georgi Gerganov
15e92fd337
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 ( #15038 )
...
* cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1
ggml-ci
* cont : fix cont types
ggml-ci
* cont : adopt variable names and comment from the other branch
2025-08-02 17:13:05 +03:00
Sigbjørn Skjæret
2bf3fbf0b5
ci : check that pre-tokenizer hashes are up-to-date ( #15032 )
...
* torch is not required for convert_hf_to_gguf_update
* add --check-missing parameter
* check that pre-tokenizer hashes are up-to-date
2025-08-02 14:39:01 +02:00
Douglas Hanley
711d5e6fe6
convert : fix Qwen3-Embedding pre-tokenizer hash ( #15030 )
2025-08-02 12:51:02 +02:00
Jhen-Jie Hong
f738989dcb
chat : fix multiple tool_calls on hermes-2-pro ( #14962 )
2025-08-02 18:04:48 +08:00
Jeff Bolz
4cb208c93c
vulkan: coopmat2 mul_mat optimizations ( #14934 )
...
- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used
2025-08-02 11:21:37 +02:00
R0CKSTAR
3025b621d1
llama-bench: rename DB table name from test to llama_bench ( #15003 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-08-02 17:20:40 +08:00
Jeff Bolz
ec0b18802c
vulkan: Support ne[3]>1 in noncontig matrix-vector multiply ( #15015 )
2025-08-02 10:48:30 +02:00
Douglas Hanley
339bd0268c
model : support Qwen3-Embedding ( #15023 )
2025-08-02 10:44:50 +02:00
Johannes Gäßler
f906275537
server: enable token array inputs for OAI API ( #15001 )
2025-08-02 10:12:41 +02:00
Jeff Bolz
a9f7541ec2
vulkan: optimizations for direct convolution ( #14933 )
...
* vulkan: optimizations for direct convolution
- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.
* Three tiles sizes for CONV_2D, and a heuristic to choose
* reallow collectives for pre-Turing
* make SHMEM_PAD a spec constant
* fixes for intel perf - no shmem padding, placeholder shader core count
* shader variants with/without unrolling
* 0cc4m's fixes for AMD perf
Co-authored-by: 0cc4m <picard12@live.de>
---------
Co-authored-by: 0cc4m <picard12@live.de>
2025-08-02 09:57:04 +02:00
Johannes Gäßler
9c35706b98
CUDA: fix MMQ nwarps for AMD with warp_size==32 ( #15014 )
2025-08-01 20:47:32 +02:00
l-austenfeld
c76b420e4c
vendor : update vendored copy of google/minja ( #15011 )
...
* vendor : update vendored copy of google/minja
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com>
* Re-remove trailing whitespace
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com>
* Remove another trailing whitespace
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com>
---------
Signed-off-by: Lennart Austenfeld <l.austenfeld@googlemail.com>
2025-08-01 16:59:06 +02:00
stevenkuang
0f5ccd6fd1
model : add hunyuan dense ( #14878 )
...
* support hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* update hunyuan_moe to hunyuan_v1_moe
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* fix rope alpha assert and bos token
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* add blank line
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* Revert "update hunyuan_moe to hunyuan_v1_moe"
This reverts commit aa973ca219 .
* use hunyuan_dense instead of hunyuan_v1_dense
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* fix hunyuan_moe chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* remove leftover code
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* update hunyuan dense chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
* fix hunyuan dense vocab and chat template
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
---------
Signed-off-by: stevenkuang <stevenkuang@tencent.com>
2025-08-01 15:31:12 +02:00
lhez
1c872f71fb
opencl: add f16 for `add`, `sub`, `mul`, `div` ( #14984 )
2025-08-01 13:15:44 +02:00
Srihari-mcw
baad94885d
ggml : Q2k interleaving implementation - x86/x64 SIMD ( #14373 )
...
* Initial Q2_K Block Interleaving Implementation
* Addressed review comments and clean up of the code
* Post rebase fixes
* Initial CI/CD fixes
* Update declarations in arch-fallback.h
* Changes for GEMV Q2_K in arch-fallback.h
* Enable repacking only on AVX-512 machines
* Update comments in repack.cpp
* Address q2k comments
---------
Co-authored-by: Manogna-Sree <elisetti.manognasree@multicorewareinc.com>
2025-08-01 09:20:33 +03:00