Commit Graph

2033 Commits

Author SHA1 Message Date
Yu, Zijun 602f9ca4af Fix NPU accuracy 2026-01-15 11:26:00 -08:00
Yu, Zijun 9de874cb7b Support iSWA 2026-01-15 11:25:58 -08:00
Yu, Zijun 7d81861a18 Fix Hunyuan 2026-01-15 11:20:31 -08:00
Yu, Zijun 597561242f Add GeGLU 2026-01-15 11:20:31 -08:00
Yu, Zijun be07073e0e Apply EliminateZP only for npu 2026-01-15 11:20:31 -08:00
Yu, Zijun da2cc993bc WA for npu 1st token acc issue 2026-01-15 11:20:31 -08:00
Yu, Zijun 434059aef7 Fix NPU compile 2026-01-15 11:20:31 -08:00
Yu, Zijun bcc343af00 Support BF16 model 2026-01-15 11:20:31 -08:00
Yu, Zijun dc77cbb3f6 STYLE: make get_types_to_requant a function 2026-01-15 11:20:31 -08:00
Yu, Zijun 2ad1147b9b Improve debug util; Eliminate nop ReshapeReshape 2026-01-15 11:20:31 -08:00
Yu, Zijun 0f7b253cb3 Fix after rebasing 2026-01-15 11:20:31 -08:00
Yu, Zijun 810eb480f5 Simpilfy translation of get_rows 2026-01-15 11:20:31 -08:00
Yu, Zijun c5231a2448 Set m_is_static=false as default in decoder 2026-01-15 11:20:31 -08:00
Yu, Zijun 6926655f5b Add custom quant type: q8_1_c, q4_0_128 2026-01-15 11:20:31 -08:00
Yu, Zijun b593428eb3 Dequantize q4_1 q4_k q6_k for NPU 2026-01-15 11:20:31 -08:00
Yu, Zijun 82c98335d3 NPU perf: eliminate zp 2026-01-15 11:20:31 -08:00
Yu, Zijun 9ca53c7991 Add NPU Q4_0 support 2026-01-15 11:20:31 -08:00
Yu, Zijun 9900245e0b Fix test-backend-ops: Treat quantized tensors as weights 2026-01-15 11:20:31 -08:00
Yu, Zijun a1ce428004 Fix Q4_1 2026-01-15 11:19:15 -08:00
Yu, Zijun dd80b04235 Fix CI; Disable test-backend-ops 2026-01-15 11:19:15 -08:00
Yu, Zijun 6ab76ed10a Fix accuracy: disable cpu_repack 2026-01-15 11:19:15 -08:00
Yu, Zijun 663a0b8cce Quant models run with accuracy issue 2026-01-15 11:19:15 -08:00
Yu, Zijun d4ca760da8 Add quant weight conversion functions from genai gguf reader 2026-01-15 11:19:15 -08:00
Yu, Zijun 3e897df51c Update supports_buft and supports_op for quantized models 2026-01-15 11:19:15 -08:00
Yu, Zijun 56d596775d Change openvino device_type to GPU; Enable flash_attn 2026-01-15 11:19:15 -08:00
Yu, Zijun 65e1b1af6d Fix after rebasing
- Layout of cache k and cache v are unified: [seq, n_head, head_size]
- Add CPY and FLASH_ATTN_EXT, flash attn is not used yet
- Skip test-backend-ops due to flash attn test crash
- Add mutex around graph conversion to avoid test-thread-safety fali in the future
- Update NPU config
- Update GPU config to disable SDPA opt to make phi-3 run
2026-01-15 11:19:15 -08:00
Yu, Zijun 14c8a85c32 Perf: RMS fused to OV internal RMS op 2026-01-15 11:19:15 -08:00
Yu, Zijun a7b611bc93 Minor updates for raising PR 2026-01-15 11:19:15 -08:00
Yu, Zijun f4123be967 Fix test-backend-ops 2026-01-15 11:19:15 -08:00
Yu, Zijun 839f8c66a0 Remove CPY 2026-01-15 11:19:15 -08:00
Yu, Zijun 7bda5021f9 Fix NPU 2026-01-15 11:19:15 -08:00
Yu, Zijun 63d000ba40 Support op SET_ROWS 2026-01-15 11:19:15 -08:00
Yu, Zijun 9a91ca6ef9 Optimize tensor conversion, improve TTFT 2026-01-15 11:19:15 -08:00
Yu, Zijun 37ff226bb6 Use CiD for NPU 2026-01-15 11:19:15 -08:00
Yu, Zijun fc865340d5 Fix test-backend-ops 2026-01-15 10:26:28 -08:00
Yu, Zijun 43489bbfaa Revert changes in fuse_to_sdpa 2026-01-15 10:26:28 -08:00
Cavus Mustafa 1a19566b23 add mark decomp pass 2026-01-15 10:26:28 -08:00
Cavus Mustafa 93b2d09a2d mulmat type conversion update 2026-01-15 10:26:28 -08:00
Cavus Mustafa e2fdc1b988 mulmat input conversion fix 2026-01-15 10:26:28 -08:00
Yu, Zijun 01cdf4a9cc matmul in fp32 2026-01-15 10:26:28 -08:00
Cavus Mustafa 9cf56d6837 temp. changes for mark decomp 2026-01-15 10:26:28 -08:00
Yu, Zijun 4e7f04a307 Fix llama-perplexity 2026-01-15 10:26:28 -08:00
Yu, Zijun 75eec6265f Fix llama-bench; Clang-format 2026-01-15 10:26:28 -08:00
Yu, Zijun 6dc4b90635 Fix NPU 2026-01-15 10:26:28 -08:00
Yu, Zijun 44f4cf34b1 Fix Phi3 ROPE; Add test-backend-ops 2026-01-15 10:26:28 -08:00
Yu, Zijun 1ed49bbfaf Fix llama-cli 2026-01-15 10:26:28 -08:00
Yu, Zijun d61f83c9b7 Fix CPY due to cgraph change 2026-01-15 10:23:35 -08:00
Yu, Zijun f3c0519096 Reduce memory: free ov weights node after graph conversion 2026-01-15 10:20:18 -08:00
Yu, Zijun a80da69448 Pull out sin cos from rope 2026-01-15 10:20:18 -08:00
Yu, Zijun 3533c14cf6 Fix Phi3 SwiGLU and SoftMax 2026-01-15 10:20:18 -08:00