Yu, Zijun
0fa7a5efef
Refactor: remove past_token_len from extra_inputs
2026-01-15 10:20:18 -08:00
Yu, Zijun
acf358d1ce
Pull out indices creation for kv cache update
2026-01-15 10:20:18 -08:00
Yu, Zijun
bf5414c95e
Replace Concat with Broadcast in MulMat for GQA
2026-01-15 10:20:18 -08:00
Yu, Zijun
ebc4fc9f95
Fuse to SDPA
2026-01-15 10:20:18 -08:00
Yu, Zijun
73ee84fffe
Add SwiGLU
2026-01-15 10:20:18 -08:00
Yu, Zijun
4c582ac7a3
Statful transformation for CPU GPU
2026-01-15 10:20:18 -08:00
Yu, Zijun
8afee795ad
Update clang-format
2026-01-15 10:20:18 -08:00
Yu, Zijun
593484ce5f
Refactor: clean, fix warning
2026-01-15 10:20:18 -08:00
Yu, Zijun
42d4240937
Change due to ggml cgraph changes, all device work
2026-01-15 10:20:18 -08:00
Yu, Zijun
e27738a987
Add AMD64 to CMakeLists
2026-01-15 10:20:18 -08:00
Yu, Zijun
592d7f8bbb
Change due to ggml cgraph changes, llama-3.2 CPU work
2026-01-15 10:20:18 -08:00
Yu, Zijun
f7ad77930e
Change due to ggml cgraph changes, not correct yet
2026-01-15 10:20:18 -08:00
Yu, Zijun
d9ca8f5dbe
NPU support version 2: prefill + kvcache
2026-01-15 10:20:18 -08:00
Yu, Zijun
34531abce4
draft NPU support version 2: prefill + kvcache
2026-01-15 10:20:18 -08:00
Yu, Zijun
7fec223334
Add initial NPU support
2026-01-15 10:20:18 -08:00
Yu, Zijun
8ce5cc597a
Add cgraph tensor output name to OV op name
2026-01-15 10:20:18 -08:00
Yu, Zijun
d7cc802292
PERF: use Slice+Concat in writing cache_v
2026-01-15 10:20:18 -08:00
Yu, Zijun
8ac5c225aa
FIX: set_max_token_len
2026-01-15 10:20:18 -08:00
Yu, Zijun
a30dc6e726
PERF: add weight constant in parallel
2026-01-15 10:20:18 -08:00
Yu, Zijun
c57f61494a
FIX: input shape of KQ_mask
2026-01-15 10:20:18 -08:00
Yu, Zijun
041d220dfa
FIX: Re-add tensor names in cgraph, Add another case for RESHAPE
2026-01-15 10:20:13 -08:00
Yu, Zijun
0d505b4e56
STYLE and minor REFACTOR
2026-01-15 10:10:00 -08:00
Yu, Zijun
cdf5370cb5
PERF: favor low precision matmul
2026-01-15 10:10:00 -08:00
Yu, Zijun
0d009fe61a
FEAT: Add all conversion code from ov side
2026-01-15 10:10:00 -08:00
Yu, Zijun
f15a2cc057
STYLE: clang-format
2026-01-15 10:10:00 -08:00
Yu, Zijun
a0b30529bf
FIX: backend buffer type issue
2026-01-15 10:10:00 -08:00
Zijun Yu
4c905b2b25
fix build error
2026-01-15 10:10:00 -08:00
Viraj Wadhwa
ffabe95e2a
Rebase - Bring up to date and fix build process
2026-01-15 10:09:23 -08:00
Yu, Zijun
a8e5efa44e
PERF: compile once (dynamic graph + cache)
2026-01-15 10:05:41 -08:00
Yu, Zijun
7d5e234254
FEAT: improve debug capability
2026-01-15 10:05:41 -08:00
Yu, Zijun
0a8cc9ab03
BUILD: update build doc, add cmake preset, add CACHE_DIR env var
2026-01-15 10:05:41 -08:00
Yu, Zijun
d3bdca25bd
PERF: share const nodes for weights for diff infer
2026-01-15 10:05:41 -08:00
Yu, Zijun
96ba47dd43
STYLE: minor refactor
2026-01-15 10:05:41 -08:00
Yu, Zijun
c04966cda6
REFACTOR: support weigts as constant
2026-01-15 10:05:41 -08:00
Yu, Zijun
0c7b026ecc
FEAT: Add interleaved mode for ROPE
2026-01-15 10:05:41 -08:00
Yu, Zijun
6ed44a3dff
FEAT: do PERMUTE eagerly
2026-01-15 10:05:41 -08:00
Yu, Zijun
8b408869ae
Arbitrary token len (>32) work; Fix bug in mulmat
2026-01-15 10:05:41 -08:00
Yu, Zijun
8d263bd6a5
2nd+ token correct by fix CPY in OV, remove single op backend compute code
2026-01-15 10:05:41 -08:00
Yu, Zijun
91d2a195b5
change op mappings to list in openvino_supports_op
2026-01-15 10:05:41 -08:00
Yu, Zijun
651b2c06cb
* Use find_package in CMake to configure OpenVINO
...
* Remove OPENVINO_OP_DEBUG
* Simplify set_input_output in decoder
* Fix CPY in set_input_output
* Use params from converted ov model in setting input
2026-01-15 10:05:41 -08:00
zhanmyz
84be5c6f15
1. Delete some comments
...
2. Process Prompt and predict first token is OK
2026-01-15 10:05:41 -08:00
zhanmyz
eac9a99530
1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase of “1. Process Prompt and predict the first token”.
...
2. There is still an AC issue in the "2. Predict the subsequent tokens phase" and it is being debugged.
A deviation has been detected in the computation of OpenVINO's CPY Node at stage 2, and it is currently being fixed.
2026-01-15 10:05:41 -08:00
zhanmyz
8ae700ae11
Process Prompt and predict first token is OK
2026-01-15 10:05:41 -08:00
zhanmyz
8020138406
add debug info
2026-01-15 10:05:41 -08:00
zhanmyz
b02265a507
1. In the Prompt process and predict first token stage, the PERMUTE node needs to be integrated into the OV Frontend
...
2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz
19ec9b6bf5
Try to add VIEW node to OV Frontend and have some issues that need to be dealt with
2026-01-15 10:05:41 -08:00
zhanmyz
b14b49d5f6
Minor Update
2026-01-15 10:05:41 -08:00
zhanmyz
467a5ddf04
1. Update the implementation of CPY node when it's non-contiguous
...
2. Remove duplicate get node operation function
2026-01-15 10:05:41 -08:00
zhanmyz
cff473a9e2
1. All operators implemented using OpenVINO can be successfully executed individually.
...
2. VIEW op output tensor shape is not same with CONT(non-contiguous) input tensor shape
3. CPY(non-contiguous) can't be implemented with original input/output tensor shape and data(need change the original shape when create input/output tensor)
Currently. VIEW op executed in the ggml backend and others executed in the OpenVINO Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz
e08a7fda33
All adjacent ops can conversion but calculation result is wrong and need debugging
2026-01-15 10:05:41 -08:00