Yu, Zijun
|
44f4cf34b1
|
Fix Phi3 ROPE; Add test-backend-ops
|
2026-01-15 10:26:28 -08:00 |
Yu, Zijun
|
1ed49bbfaf
|
Fix llama-cli
|
2026-01-15 10:26:28 -08:00 |
ravi9
|
ea75772e48
|
Added OpenVINO CI/CD. Updated docs
|
2026-01-15 10:26:25 -08:00 |
Yu, Zijun
|
d61f83c9b7
|
Fix CPY due to cgraph change
|
2026-01-15 10:23:35 -08:00 |
Yu, Zijun
|
f3c0519096
|
Reduce memory: free ov weights node after graph conversion
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
a80da69448
|
Pull out sin cos from rope
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
3533c14cf6
|
Fix Phi3 SwiGLU and SoftMax
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
0fa7a5efef
|
Refactor: remove past_token_len from extra_inputs
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
acf358d1ce
|
Pull out indices creation for kv cache update
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
bf5414c95e
|
Replace Concat with Broadcast in MulMat for GQA
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
ebc4fc9f95
|
Fuse to SDPA
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
73ee84fffe
|
Add SwiGLU
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
4c582ac7a3
|
Statful transformation for CPU GPU
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
8afee795ad
|
Update clang-format
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
593484ce5f
|
Refactor: clean, fix warning
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
42d4240937
|
Change due to ggml cgraph changes, all device work
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
e27738a987
|
Add AMD64 to CMakeLists
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
592d7f8bbb
|
Change due to ggml cgraph changes, llama-3.2 CPU work
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
f7ad77930e
|
Change due to ggml cgraph changes, not correct yet
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
d9ca8f5dbe
|
NPU support version 2: prefill + kvcache
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
34531abce4
|
draft NPU support version 2: prefill + kvcache
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
7fec223334
|
Add initial NPU support
|
2026-01-15 10:20:18 -08:00 |
Ravi Panchumarthy
|
3051d5ae07
|
Update openvino build instructions
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
8ce5cc597a
|
Add cgraph tensor output name to OV op name
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
fd324366d0
|
Update build doc
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
d7cc802292
|
PERF: use Slice+Concat in writing cache_v
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
8ac5c225aa
|
FIX: set_max_token_len
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
a30dc6e726
|
PERF: add weight constant in parallel
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
c57f61494a
|
FIX: input shape of KQ_mask
|
2026-01-15 10:20:18 -08:00 |
Yu, Zijun
|
041d220dfa
|
FIX: Re-add tensor names in cgraph, Add another case for RESHAPE
|
2026-01-15 10:20:13 -08:00 |
Yu, Zijun
|
0d505b4e56
|
STYLE and minor REFACTOR
|
2026-01-15 10:10:00 -08:00 |
Yu, Zijun
|
cdf5370cb5
|
PERF: favor low precision matmul
|
2026-01-15 10:10:00 -08:00 |
Yu, Zijun
|
0d009fe61a
|
FEAT: Add all conversion code from ov side
|
2026-01-15 10:10:00 -08:00 |
Yu, Zijun
|
f15a2cc057
|
STYLE: clang-format
|
2026-01-15 10:10:00 -08:00 |
Yu, Zijun
|
a0b30529bf
|
FIX: backend buffer type issue
|
2026-01-15 10:10:00 -08:00 |
Zijun Yu
|
4c905b2b25
|
fix build error
|
2026-01-15 10:10:00 -08:00 |
Viraj Wadhwa
|
ffabe95e2a
|
Rebase - Bring up to date and fix build process
|
2026-01-15 10:09:23 -08:00 |
Yu, Zijun
|
a8e5efa44e
|
PERF: compile once (dynamic graph + cache)
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
7d5e234254
|
FEAT: improve debug capability
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
0a8cc9ab03
|
BUILD: update build doc, add cmake preset, add CACHE_DIR env var
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
d3bdca25bd
|
PERF: share const nodes for weights for diff infer
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
96ba47dd43
|
STYLE: minor refactor
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
c04966cda6
|
REFACTOR: support weigts as constant
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
0c7b026ecc
|
FEAT: Add interleaved mode for ROPE
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
6ed44a3dff
|
FEAT: do PERMUTE eagerly
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
8b408869ae
|
Arbitrary token len (>32) work; Fix bug in mulmat
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
8d263bd6a5
|
2nd+ token correct by fix CPY in OV, remove single op backend compute code
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
91d2a195b5
|
change op mappings to list in openvino_supports_op
|
2026-01-15 10:05:41 -08:00 |
Yu, Zijun
|
651b2c06cb
|
* Use find_package in CMake to configure OpenVINO
* Remove OPENVINO_OP_DEBUG
* Simplify set_input_output in decoder
* Fix CPY in set_input_output
* Use params from converted ov model in setting input
|
2026-01-15 10:05:41 -08:00 |
zhanmyz
|
84be5c6f15
|
1. Delete some comments
2. Process Prompt and predict first token is OK
|
2026-01-15 10:05:41 -08:00 |