Yu, Zijun
|
531941b348
|
Fix NPU
|
2026-01-15 11:28:48 -08:00 |
Yu, Zijun
|
ae404f7cbb
|
Fix llama-bench
|
2026-01-15 11:28:48 -08:00 |
Yu, Zijun
|
072dde0b2b
|
change graph to 4d, support multi sequences
|
2026-01-15 11:28:48 -08:00 |
Yu, Zijun
|
ea2c99be1c
|
NPU unify PD (handled internally)
|
2026-01-15 11:28:48 -08:00 |
Yu, Zijun
|
303923aba7
|
Clean placeholders in ggml-openvino.cpp
|
2026-01-15 11:27:30 -08:00 |
Zijun Yu
|
b8690bc055
|
NPU Unify PD (#14)
* Stateless. Fix llama-cli llama-server
* Simplify broadcast op in attention
* Replace get_output_tensor+memcpy with set_output_tensor
* NPU unify PD. Unify dynamic and static dims
|
2026-01-15 11:27:30 -08:00 |
Yu, Zijun
|
eba8113dc4
|
Style: middle ptr and ref align, omit optional struct keyword
|
2026-01-15 11:27:30 -08:00 |
Yu, Zijun
|
bd3093f90c
|
Style: use switch in supports_ops
|
2026-01-15 11:27:30 -08:00 |
Ravi Panchumarthy
|
3a1129e073
|
Update OV dockerfile to use OV2025.3 and update build docs
|
2026-01-15 11:27:30 -08:00 |
Ravi Panchumarthy
|
45af912b48
|
Update CI to run OV dep install before build
|
2026-01-15 11:27:30 -08:00 |
Ravi Panchumarthy
|
38e8a19f50
|
Apply CISC review and update CI to OV2025.3
|
2026-01-15 11:27:28 -08:00 |
Yu, Zijun
|
4c8406eb70
|
Add OV CI cache
|
2026-01-15 11:26:00 -08:00 |
Ravi Panchumarthy
|
841d673bd0
|
Update to OV-2025.3 and CMakeLists.txt
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
2d2f00a41f
|
Fix llama-3-8b and phi3-mini q4_0 NPU
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
299f4923bb
|
fix after rebasing
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
8b82d1153b
|
Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
a9371ea646
|
Fix llama-cli (need to run with --no-warmup)
|
2026-01-15 11:26:00 -08:00 |
cavusmustafa
|
05d7abae8c
|
Fix for Phi3
|
2026-01-15 11:26:00 -08:00 |
cavusmustafa
|
e7252920e1
|
env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added
|
2026-01-15 11:26:00 -08:00 |
cavusmustafa
|
c112bc4e73
|
kvcachefusion support
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
973a80fd02
|
Always apply Eliminate_ZP to fix GPU compile issue on some platforms
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
fdadca1e89
|
Fix after rebasing
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
f3afa7b914
|
Requantize Q6_K (gs16) to gs32 on GPU
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
e4bfe5a20d
|
Add Q5_K to support phi-3-q4_k_m
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
2f1d50fb07
|
Minor refactor
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
67e178a2f6
|
Minor: not add attention_size_swa for non-swa model
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
1a38339cea
|
Fix ROPE accuracy when freq_scale != 1
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
602f9ca4af
|
Fix NPU accuracy
|
2026-01-15 11:26:00 -08:00 |
Yu, Zijun
|
9de874cb7b
|
Support iSWA
|
2026-01-15 11:25:58 -08:00 |
Yu, Zijun
|
7d81861a18
|
Fix Hunyuan
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
597561242f
|
Add GeGLU
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
be07073e0e
|
Apply EliminateZP only for npu
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
da2cc993bc
|
WA for npu 1st token acc issue
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
434059aef7
|
Fix NPU compile
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
bcc343af00
|
Support BF16 model
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
dc77cbb3f6
|
STYLE: make get_types_to_requant a function
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
2ad1147b9b
|
Improve debug util; Eliminate nop ReshapeReshape
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
0f7b253cb3
|
Fix after rebasing
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
810eb480f5
|
Simpilfy translation of get_rows
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
c5231a2448
|
Set m_is_static=false as default in decoder
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
6926655f5b
|
Add custom quant type: q8_1_c, q4_0_128
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
b593428eb3
|
Dequantize q4_1 q4_k q6_k for NPU
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
82c98335d3
|
NPU perf: eliminate zp
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
9ca53c7991
|
Add NPU Q4_0 support
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
9900245e0b
|
Fix test-backend-ops: Treat quantized tensors as weights
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
a1ce428004
|
Fix Q4_1
|
2026-01-15 11:19:15 -08:00 |
Yu, Zijun
|
dd80b04235
|
Fix CI; Disable test-backend-ops
|
2026-01-15 11:19:15 -08:00 |
Yu, Zijun
|
6ab76ed10a
|
Fix accuracy: disable cpu_repack
|
2026-01-15 11:19:15 -08:00 |
Yu, Zijun
|
663a0b8cce
|
Quant models run with accuracy issue
|
2026-01-15 11:19:15 -08:00 |
Yu, Zijun
|
d4ca760da8
|
Add quant weight conversion functions from genai gguf reader
|
2026-01-15 11:19:15 -08:00 |