Commit Graph

7842 Commits

Author SHA1 Message Date
ravi9 ea75772e48 Added OpenVINO CI/CD. Updated docs 2026-01-15 10:26:25 -08:00
Yu, Zijun d61f83c9b7 Fix CPY due to cgraph change 2026-01-15 10:23:35 -08:00
Yu, Zijun f3c0519096 Reduce memory: free ov weights node after graph conversion 2026-01-15 10:20:18 -08:00
Yu, Zijun a80da69448 Pull out sin cos from rope 2026-01-15 10:20:18 -08:00
Yu, Zijun 3533c14cf6 Fix Phi3 SwiGLU and SoftMax 2026-01-15 10:20:18 -08:00
Yu, Zijun 0fa7a5efef Refactor: remove past_token_len from extra_inputs 2026-01-15 10:20:18 -08:00
Yu, Zijun acf358d1ce Pull out indices creation for kv cache update 2026-01-15 10:20:18 -08:00
Yu, Zijun bf5414c95e Replace Concat with Broadcast in MulMat for GQA 2026-01-15 10:20:18 -08:00
Yu, Zijun ebc4fc9f95 Fuse to SDPA 2026-01-15 10:20:18 -08:00
Yu, Zijun 73ee84fffe Add SwiGLU 2026-01-15 10:20:18 -08:00
Yu, Zijun 4c582ac7a3 Statful transformation for CPU GPU 2026-01-15 10:20:18 -08:00
Yu, Zijun 8afee795ad Update clang-format 2026-01-15 10:20:18 -08:00
Yu, Zijun 593484ce5f Refactor: clean, fix warning 2026-01-15 10:20:18 -08:00
Yu, Zijun 42d4240937 Change due to ggml cgraph changes, all device work 2026-01-15 10:20:18 -08:00
Yu, Zijun e27738a987 Add AMD64 to CMakeLists 2026-01-15 10:20:18 -08:00
Yu, Zijun 592d7f8bbb Change due to ggml cgraph changes, llama-3.2 CPU work 2026-01-15 10:20:18 -08:00
Yu, Zijun f7ad77930e Change due to ggml cgraph changes, not correct yet 2026-01-15 10:20:18 -08:00
Yu, Zijun d9ca8f5dbe NPU support version 2: prefill + kvcache 2026-01-15 10:20:18 -08:00
Yu, Zijun 34531abce4 draft NPU support version 2: prefill + kvcache 2026-01-15 10:20:18 -08:00
Yu, Zijun 7fec223334 Add initial NPU support 2026-01-15 10:20:18 -08:00
Ravi Panchumarthy 3051d5ae07 Update openvino build instructions 2026-01-15 10:20:18 -08:00
Yu, Zijun 8ce5cc597a Add cgraph tensor output name to OV op name 2026-01-15 10:20:18 -08:00
Yu, Zijun fd324366d0 Update build doc 2026-01-15 10:20:18 -08:00
Yu, Zijun d7cc802292 PERF: use Slice+Concat in writing cache_v 2026-01-15 10:20:18 -08:00
Yu, Zijun 8ac5c225aa FIX: set_max_token_len 2026-01-15 10:20:18 -08:00
Yu, Zijun a30dc6e726 PERF: add weight constant in parallel 2026-01-15 10:20:18 -08:00
Yu, Zijun c57f61494a FIX: input shape of KQ_mask 2026-01-15 10:20:18 -08:00
Yu, Zijun 041d220dfa FIX: Re-add tensor names in cgraph, Add another case for RESHAPE 2026-01-15 10:20:13 -08:00
Yu, Zijun 0d505b4e56 STYLE and minor REFACTOR 2026-01-15 10:10:00 -08:00
Yu, Zijun cdf5370cb5 PERF: favor low precision matmul 2026-01-15 10:10:00 -08:00
Yu, Zijun 0d009fe61a FEAT: Add all conversion code from ov side 2026-01-15 10:10:00 -08:00
Yu, Zijun f15a2cc057 STYLE: clang-format 2026-01-15 10:10:00 -08:00
Yu, Zijun a0b30529bf FIX: backend buffer type issue 2026-01-15 10:10:00 -08:00
Zijun Yu 4c905b2b25 fix build error 2026-01-15 10:10:00 -08:00
Viraj Wadhwa ffabe95e2a Rebase - Bring up to date and fix build process 2026-01-15 10:09:23 -08:00
Yu, Zijun a8e5efa44e PERF: compile once (dynamic graph + cache) 2026-01-15 10:05:41 -08:00
Yu, Zijun 7d5e234254 FEAT: improve debug capability 2026-01-15 10:05:41 -08:00
Yu, Zijun 0a8cc9ab03 BUILD: update build doc, add cmake preset, add CACHE_DIR env var 2026-01-15 10:05:41 -08:00
Yu, Zijun d3bdca25bd PERF: share const nodes for weights for diff infer 2026-01-15 10:05:41 -08:00
Yu, Zijun 96ba47dd43 STYLE: minor refactor 2026-01-15 10:05:41 -08:00
Yu, Zijun c04966cda6 REFACTOR: support weigts as constant 2026-01-15 10:05:41 -08:00
Yu, Zijun 0c7b026ecc FEAT: Add interleaved mode for ROPE 2026-01-15 10:05:41 -08:00
Yu, Zijun 6ed44a3dff FEAT: do PERMUTE eagerly 2026-01-15 10:05:41 -08:00
Yu, Zijun 8b408869ae Arbitrary token len (>32) work; Fix bug in mulmat 2026-01-15 10:05:41 -08:00
Yu, Zijun 8d263bd6a5 2nd+ token correct by fix CPY in OV, remove single op backend compute code 2026-01-15 10:05:41 -08:00
Yu, Zijun 91d2a195b5 change op mappings to list in openvino_supports_op 2026-01-15 10:05:41 -08:00
Yu, Zijun 651b2c06cb * Use find_package in CMake to configure OpenVINO
* Remove OPENVINO_OP_DEBUG
* Simplify set_input_output in decoder
* Fix CPY in set_input_output
* Use params from converted ov model in setting input
2026-01-15 10:05:41 -08:00
zhanmyz 84be5c6f15 1. Delete some comments
2. Process Prompt and predict first token is OK
2026-01-15 10:05:41 -08:00
zhanmyz eac9a99530 1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase of “1. Process Prompt and predict the first token”.
2. There is still an AC issue in the "2. Predict the subsequent tokens phase" and it is being debugged.
   A deviation has been detected in the computation of OpenVINO's CPY Node at stage 2, and it is currently being fixed.
2026-01-15 10:05:41 -08:00
zhanmyz 8ae700ae11 Process Prompt and predict first token is OK 2026-01-15 10:05:41 -08:00