Yu, Zijun
8b82d1153b
Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working
2026-01-15 11:26:00 -08:00
Yu, Zijun
a9371ea646
Fix llama-cli (need to run with --no-warmup)
2026-01-15 11:26:00 -08:00
cavusmustafa
05d7abae8c
Fix for Phi3
2026-01-15 11:26:00 -08:00
cavusmustafa
e7252920e1
env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added
2026-01-15 11:26:00 -08:00
cavusmustafa
c112bc4e73
kvcachefusion support
2026-01-15 11:26:00 -08:00
Yu, Zijun
973a80fd02
Always apply Eliminate_ZP to fix GPU compile issue on some platforms
2026-01-15 11:26:00 -08:00
Yu, Zijun
fdadca1e89
Fix after rebasing
2026-01-15 11:26:00 -08:00
Yu, Zijun
f3afa7b914
Requantize Q6_K (gs16) to gs32 on GPU
2026-01-15 11:26:00 -08:00
Yu, Zijun
e4bfe5a20d
Add Q5_K to support phi-3-q4_k_m
2026-01-15 11:26:00 -08:00
Yu, Zijun
2f1d50fb07
Minor refactor
2026-01-15 11:26:00 -08:00
Yu, Zijun
67e178a2f6
Minor: not add attention_size_swa for non-swa model
2026-01-15 11:26:00 -08:00
Yu, Zijun
1a38339cea
Fix ROPE accuracy when freq_scale != 1
2026-01-15 11:26:00 -08:00
Yu, Zijun
602f9ca4af
Fix NPU accuracy
2026-01-15 11:26:00 -08:00
Yu, Zijun
9de874cb7b
Support iSWA
2026-01-15 11:25:58 -08:00
Yu, Zijun
7d81861a18
Fix Hunyuan
2026-01-15 11:20:31 -08:00
Yu, Zijun
597561242f
Add GeGLU
2026-01-15 11:20:31 -08:00
Yu, Zijun
be07073e0e
Apply EliminateZP only for npu
2026-01-15 11:20:31 -08:00
Yu, Zijun
da2cc993bc
WA for npu 1st token acc issue
2026-01-15 11:20:31 -08:00
Yu, Zijun
434059aef7
Fix NPU compile
2026-01-15 11:20:31 -08:00
Yu, Zijun
bcc343af00
Support BF16 model
2026-01-15 11:20:31 -08:00
Yu, Zijun
dc77cbb3f6
STYLE: make get_types_to_requant a function
2026-01-15 11:20:31 -08:00
Yu, Zijun
2ad1147b9b
Improve debug util; Eliminate nop ReshapeReshape
2026-01-15 11:20:31 -08:00
Yu, Zijun
0f7b253cb3
Fix after rebasing
2026-01-15 11:20:31 -08:00
Yu, Zijun
810eb480f5
Simpilfy translation of get_rows
2026-01-15 11:20:31 -08:00
Yu, Zijun
c5231a2448
Set m_is_static=false as default in decoder
2026-01-15 11:20:31 -08:00
Yu, Zijun
6926655f5b
Add custom quant type: q8_1_c, q4_0_128
2026-01-15 11:20:31 -08:00
Yu, Zijun
b593428eb3
Dequantize q4_1 q4_k q6_k for NPU
2026-01-15 11:20:31 -08:00
Yu, Zijun
82c98335d3
NPU perf: eliminate zp
2026-01-15 11:20:31 -08:00
Yu, Zijun
9ca53c7991
Add NPU Q4_0 support
2026-01-15 11:20:31 -08:00
Yu, Zijun
9900245e0b
Fix test-backend-ops: Treat quantized tensors as weights
2026-01-15 11:20:31 -08:00
Yu, Zijun
a1ce428004
Fix Q4_1
2026-01-15 11:19:15 -08:00
Yu, Zijun
dd80b04235
Fix CI; Disable test-backend-ops
2026-01-15 11:19:15 -08:00
Yu, Zijun
6ab76ed10a
Fix accuracy: disable cpu_repack
2026-01-15 11:19:15 -08:00
Yu, Zijun
663a0b8cce
Quant models run with accuracy issue
2026-01-15 11:19:15 -08:00
Yu, Zijun
d4ca760da8
Add quant weight conversion functions from genai gguf reader
2026-01-15 11:19:15 -08:00
Yu, Zijun
3e897df51c
Update supports_buft and supports_op for quantized models
2026-01-15 11:19:15 -08:00
Yu, Zijun
56d596775d
Change openvino device_type to GPU; Enable flash_attn
2026-01-15 11:19:15 -08:00
Yu, Zijun
65e1b1af6d
Fix after rebasing
...
- Layout of cache k and cache v are unified: [seq, n_head, head_size]
- Add CPY and FLASH_ATTN_EXT, flash attn is not used yet
- Skip test-backend-ops due to flash attn test crash
- Add mutex around graph conversion to avoid test-thread-safety fali in the future
- Update NPU config
- Update GPU config to disable SDPA opt to make phi-3 run
2026-01-15 11:19:15 -08:00
Yu, Zijun
14c8a85c32
Perf: RMS fused to OV internal RMS op
2026-01-15 11:19:15 -08:00
Yu, Zijun
a7b611bc93
Minor updates for raising PR
2026-01-15 11:19:15 -08:00
Yu, Zijun
f4123be967
Fix test-backend-ops
2026-01-15 11:19:15 -08:00
Yu, Zijun
839f8c66a0
Remove CPY
2026-01-15 11:19:15 -08:00
Yu, Zijun
7bda5021f9
Fix NPU
2026-01-15 11:19:15 -08:00
Yu, Zijun
63d000ba40
Support op SET_ROWS
2026-01-15 11:19:15 -08:00
Yu, Zijun
9a91ca6ef9
Optimize tensor conversion, improve TTFT
2026-01-15 11:19:15 -08:00
Yu, Zijun
37ff226bb6
Use CiD for NPU
2026-01-15 11:19:15 -08:00
Yu, Zijun
1141350310
Skip test-thread-safety; Run ctest only in ci/run.sh
2026-01-15 11:18:28 -08:00
Yu, Zijun
fc865340d5
Fix test-backend-ops
2026-01-15 10:26:28 -08:00
Ravi Panchumarthy
2f99135ccc
Update build.md
2026-01-15 10:26:28 -08:00
Yu, Zijun
43489bbfaa
Revert changes in fuse_to_sdpa
2026-01-15 10:26:28 -08:00