Yu, Zijun
|
072dde0b2b
|
change graph to 4d, support multi sequences
|
2026-01-15 11:28:48 -08:00 |
Zijun Yu
|
b8690bc055
|
NPU Unify PD (#14)
* Stateless. Fix llama-cli llama-server
* Simplify broadcast op in attention
* Replace get_output_tensor+memcpy with set_output_tensor
* NPU unify PD. Unify dynamic and static dims
|
2026-01-15 11:27:30 -08:00 |
Yu, Zijun
|
eba8113dc4
|
Style: middle ptr and ref align, omit optional struct keyword
|
2026-01-15 11:27:30 -08:00 |
Yu, Zijun
|
9de874cb7b
|
Support iSWA
|
2026-01-15 11:25:58 -08:00 |
Yu, Zijun
|
c5231a2448
|
Set m_is_static=false as default in decoder
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
6926655f5b
|
Add custom quant type: q8_1_c, q4_0_128
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
b593428eb3
|
Dequantize q4_1 q4_k q6_k for NPU
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
9900245e0b
|
Fix test-backend-ops: Treat quantized tensors as weights
|
2026-01-15 11:20:31 -08:00 |
Yu, Zijun
|
7bda5021f9
|
Fix NPU
|
2026-01-15 11:19:15 -08:00 |
Yu, Zijun
|
63d000ba40
|
Support op SET_ROWS
|
2026-01-15 11:19:15 -08:00 |
Yu, Zijun
|
01cdf4a9cc
|
matmul in fp32
|
2026-01-15 10:26:28 -08:00 |
Yu, Zijun
|
6dc4b90635
|
Fix NPU
|
2026-01-15 10:26:28 -08:00 |
Yu, Zijun
|
44f4cf34b1
|
Fix Phi3 ROPE; Add test-backend-ops
|
2026-01-15 10:26:28 -08:00 |
Yu, Zijun
|
f3c0519096
|
Reduce memory: free ov weights node after graph conversion
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
a80da69448
|
Pull out sin cos from rope
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
ebc4fc9f95
|
Fuse to SDPA
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
4c582ac7a3
|
Statful transformation for CPU GPU
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
593484ce5f
|
Refactor: clean, fix warning
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
34531abce4
|
draft NPU support version 2: prefill + kvcache
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
7fec223334
|
Add initial NPU support
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
a30dc6e726
|
PERF: add weight constant in parallel
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
041d220dfa
|
FIX: Re-add tensor names in cgraph, Add another case for RESHAPE
|
2026-01-15 10:20:13 -08:00 |
Yu, Zijun
|
0d009fe61a
|
FEAT: Add all conversion code from ov side
|
2026-01-15 10:10:00 -08:00 |
Viraj Wadhwa
|
ffabe95e2a
|
Rebase - Bring up to date and fix build process
|
2026-01-15 10:09:23 -08:00 |
Yu, Zijun
|
a8e5efa44e
|
PERF: compile once (dynamic graph + cache)
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
7d5e234254
|
FEAT: improve debug capability
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
d3bdca25bd
|
PERF: share const nodes for weights for diff infer
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
c04966cda6
|
REFACTOR: support weigts as constant
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
19ec9b6bf5
|
Try to add VIEW node to OV Frontend and have some issues that need to be dealt with
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
467a5ddf04
|
1. Update the implementation of CPY node when it's non-contiguous
2. Remove duplicate get node operation function
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
9a7b7d8d6d
|
OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX/ADD adjacent op graph conversion
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
081b52667b
|
Execute singel CONT operator is OK
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
afb8594194
|
add tmp source code files
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
8484769981
|
add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
cb2729bc4a
|
Move CPY from GGML OV Backend to OV Frontend
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
2b04bd43be
|
Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML backend
|
2026-01-15 10:05:41 -08:00 |
yumengbo
|
b100f89bad
|
Change to implementation following pytorch frontend
|
2026-01-15 10:05:41 -08:00 |
yumengbo
|
5b46dc23be
|
Change output for infer request to set output tensor. Support scale, view op.
|
2026-01-15 10:05:41 -08:00 |
yumengbo
|
9b7b63d12c
|
Convert subgraph with add, sub, mul, div op to ov model and do infer on openvino device
|
2026-01-15 10:05:41 -08:00 |
yumengbo
|
34e826ac14
|
Implement GgmlOvDecoder. Add dump functions.
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
77d68146a8
|
add OpenVINO frontend convert process steps
|
2026-01-15 10:05:41 -08:00 |