Cavus Mustafa
1a19566b23
add mark decomp pass
2026-01-15 10:26:28 -08:00
Cavus Mustafa
93b2d09a2d
mulmat type conversion update
2026-01-15 10:26:28 -08:00
Cavus Mustafa
e2fdc1b988
mulmat input conversion fix
2026-01-15 10:26:28 -08:00
Yu, Zijun
01cdf4a9cc
matmul in fp32
2026-01-15 10:26:28 -08:00
Cavus Mustafa
9cf56d6837
temp. changes for mark decomp
2026-01-15 10:26:28 -08:00
Yu, Zijun
4e7f04a307
Fix llama-perplexity
2026-01-15 10:26:28 -08:00
Yu, Zijun
75eec6265f
Fix llama-bench; Clang-format
2026-01-15 10:26:28 -08:00
Yu, Zijun
6dc4b90635
Fix NPU
2026-01-15 10:26:28 -08:00
Yu, Zijun
44f4cf34b1
Fix Phi3 ROPE; Add test-backend-ops
2026-01-15 10:26:28 -08:00
Yu, Zijun
1ed49bbfaf
Fix llama-cli
2026-01-15 10:26:28 -08:00
Yu, Zijun
d61f83c9b7
Fix CPY due to cgraph change
2026-01-15 10:23:35 -08:00
Yu, Zijun
f3c0519096
Reduce memory: free ov weights node after graph conversion
2026-01-15 10:20:18 -08:00
Yu, Zijun
a80da69448
Pull out sin cos from rope
2026-01-15 10:20:18 -08:00
Yu, Zijun
3533c14cf6
Fix Phi3 SwiGLU and SoftMax
2026-01-15 10:20:18 -08:00
Yu, Zijun
0fa7a5efef
Refactor: remove past_token_len from extra_inputs
2026-01-15 10:20:18 -08:00
Yu, Zijun
acf358d1ce
Pull out indices creation for kv cache update
2026-01-15 10:20:18 -08:00
Yu, Zijun
bf5414c95e
Replace Concat with Broadcast in MulMat for GQA
2026-01-15 10:20:18 -08:00
Yu, Zijun
ebc4fc9f95
Fuse to SDPA
2026-01-15 10:20:18 -08:00
Yu, Zijun
73ee84fffe
Add SwiGLU
2026-01-15 10:20:18 -08:00
Yu, Zijun
4c582ac7a3
Statful transformation for CPU GPU
2026-01-15 10:20:18 -08:00
Yu, Zijun
8afee795ad
Update clang-format
2026-01-15 10:20:18 -08:00
Yu, Zijun
593484ce5f
Refactor: clean, fix warning
2026-01-15 10:20:18 -08:00
Yu, Zijun
42d4240937
Change due to ggml cgraph changes, all device work
2026-01-15 10:20:18 -08:00
Yu, Zijun
e27738a987
Add AMD64 to CMakeLists
2026-01-15 10:20:18 -08:00
Yu, Zijun
592d7f8bbb
Change due to ggml cgraph changes, llama-3.2 CPU work
2026-01-15 10:20:18 -08:00
Yu, Zijun
f7ad77930e
Change due to ggml cgraph changes, not correct yet
2026-01-15 10:20:18 -08:00
Yu, Zijun
d9ca8f5dbe
NPU support version 2: prefill + kvcache
2026-01-15 10:20:18 -08:00
Yu, Zijun
34531abce4
draft NPU support version 2: prefill + kvcache
2026-01-15 10:20:18 -08:00
Yu, Zijun
7fec223334
Add initial NPU support
2026-01-15 10:20:18 -08:00
Yu, Zijun
8ce5cc597a
Add cgraph tensor output name to OV op name
2026-01-15 10:20:18 -08:00
Yu, Zijun
d7cc802292
PERF: use Slice+Concat in writing cache_v
2026-01-15 10:20:18 -08:00
Yu, Zijun
8ac5c225aa
FIX: set_max_token_len
2026-01-15 10:20:18 -08:00
Yu, Zijun
a30dc6e726
PERF: add weight constant in parallel
2026-01-15 10:20:18 -08:00
Yu, Zijun
c57f61494a
FIX: input shape of KQ_mask
2026-01-15 10:20:18 -08:00
Yu, Zijun
041d220dfa
FIX: Re-add tensor names in cgraph, Add another case for RESHAPE
2026-01-15 10:20:13 -08:00
Yu, Zijun
0d505b4e56
STYLE and minor REFACTOR
2026-01-15 10:10:00 -08:00
Yu, Zijun
cdf5370cb5
PERF: favor low precision matmul
2026-01-15 10:10:00 -08:00
Yu, Zijun
0d009fe61a
FEAT: Add all conversion code from ov side
2026-01-15 10:10:00 -08:00
Yu, Zijun
f15a2cc057
STYLE: clang-format
2026-01-15 10:10:00 -08:00
Yu, Zijun
a0b30529bf
FIX: backend buffer type issue
2026-01-15 10:10:00 -08:00
Zijun Yu
4c905b2b25
fix build error
2026-01-15 10:10:00 -08:00
Viraj Wadhwa
ffabe95e2a
Rebase - Bring up to date and fix build process
2026-01-15 10:09:23 -08:00
Yu, Zijun
a8e5efa44e
PERF: compile once (dynamic graph + cache)
2026-01-15 10:05:41 -08:00
Yu, Zijun
7d5e234254
FEAT: improve debug capability
2026-01-15 10:05:41 -08:00
Yu, Zijun
0a8cc9ab03
BUILD: update build doc, add cmake preset, add CACHE_DIR env var
2026-01-15 10:05:41 -08:00
Yu, Zijun
d3bdca25bd
PERF: share const nodes for weights for diff infer
2026-01-15 10:05:41 -08:00
Yu, Zijun
96ba47dd43
STYLE: minor refactor
2026-01-15 10:05:41 -08:00
Yu, Zijun
c04966cda6
REFACTOR: support weigts as constant
2026-01-15 10:05:41 -08:00
Yu, Zijun
0c7b026ecc
FEAT: Add interleaved mode for ROPE
2026-01-15 10:05:41 -08:00
Yu, Zijun
6ed44a3dff
FEAT: do PERMUTE eagerly
2026-01-15 10:05:41 -08:00
Yu, Zijun
8b408869ae
Arbitrary token len (>32) work; Fix bug in mulmat
2026-01-15 10:05:41 -08:00
Yu, Zijun
8d263bd6a5
2nd+ token correct by fix CPY in OV, remove single op backend compute code
2026-01-15 10:05:41 -08:00
Yu, Zijun
91d2a195b5
change op mappings to list in openvino_supports_op
2026-01-15 10:05:41 -08:00
Yu, Zijun
651b2c06cb
* Use find_package in CMake to configure OpenVINO
...
* Remove OPENVINO_OP_DEBUG
* Simplify set_input_output in decoder
* Fix CPY in set_input_output
* Use params from converted ov model in setting input
2026-01-15 10:05:41 -08:00
zhanmyz
84be5c6f15
1. Delete some comments
...
2. Process Prompt and predict first token is OK
2026-01-15 10:05:41 -08:00
zhanmyz
eac9a99530
1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase of “1. Process Prompt and predict the first token”.
...
2. There is still an AC issue in the "2. Predict the subsequent tokens phase" and it is being debugged.
A deviation has been detected in the computation of OpenVINO's CPY Node at stage 2, and it is currently being fixed.
2026-01-15 10:05:41 -08:00
zhanmyz
8ae700ae11
Process Prompt and predict first token is OK
2026-01-15 10:05:41 -08:00
zhanmyz
8020138406
add debug info
2026-01-15 10:05:41 -08:00
zhanmyz
b02265a507
1. In the Prompt process and predict first token stage, the PERMUTE node needs to be integrated into the OV Frontend
...
2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz
19ec9b6bf5
Try to add VIEW node to OV Frontend and have some issues that need to be dealt with
2026-01-15 10:05:41 -08:00
zhanmyz
b14b49d5f6
Minor Update
2026-01-15 10:05:41 -08:00
zhanmyz
467a5ddf04
1. Update the implementation of CPY node when it's non-contiguous
...
2. Remove duplicate get node operation function
2026-01-15 10:05:41 -08:00
zhanmyz
cff473a9e2
1. All operators implemented using OpenVINO can be successfully executed individually.
...
2. VIEW op output tensor shape is not same with CONT(non-contiguous) input tensor shape
3. CPY(non-contiguous) can't be implemented with original input/output tensor shape and data(need change the original shape when create input/output tensor)
Currently. VIEW op executed in the ggml backend and others executed in the OpenVINO Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz
e08a7fda33
All adjacent ops can conversion but calculation result is wrong and need debugging
2026-01-15 10:05:41 -08:00
zhanmyz
d05c458421
change CONT and MULMAT input node shape
2026-01-15 10:05:41 -08:00
zhanmyz
246a2d1021
Change the input and ouput node shape of MUL_MAT operator
2026-01-15 10:05:41 -08:00
zhanmyz
f37fa21a5c
Change the input and ouput node shape of MUL_MAT operator
2026-01-15 10:05:41 -08:00
zhanmyz
f98d215162
Change the input parameter shape of CONT operator
2026-01-15 10:05:41 -08:00
zhanmyz
9a7b7d8d6d
OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX/ADD adjacent op graph conversion
2026-01-15 10:05:41 -08:00
zhanmyz
95ae982d59
OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion of consecutive OPs
2026-01-15 10:05:41 -08:00
zhanmyz
901f7347ff
Execute CONT & VIEW operators in OV Frontend is OK
2026-01-15 10:05:41 -08:00
zhanmyz
081b52667b
Execute singel CONT operator is OK
2026-01-15 10:05:41 -08:00
zhanmyz
afb8594194
add tmp source code files
2026-01-15 10:05:41 -08:00
zhanmyz
57582fda39
add implementation of CPY when the output tensor is non-contiguous
2026-01-15 10:05:41 -08:00
zhanmyz
8484769981
add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops
2026-01-15 10:05:41 -08:00
zhanmyz
cb2729bc4a
Move CPY from GGML OV Backend to OV Frontend
2026-01-15 10:05:41 -08:00
zhanmyz
2b04bd43be
Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML backend
2026-01-15 10:05:41 -08:00
zhanmyz
0f7d07de7d
Add support for RMS_NORM OP
2026-01-15 10:05:41 -08:00
yumengbo
2353c73f53
Support ROPE op.
2026-01-15 10:05:41 -08:00
yumengbo
8aba03bac6
Support Softmax op
2026-01-15 10:05:41 -08:00
yumengbo
d218c61e6d
Support Softmax op
2026-01-15 10:05:41 -08:00
yumengbo
590f587b27
Add support for UNARY SILU op . Fix pytorch impl bugs.
2026-01-15 10:05:41 -08:00
yumengbo
b100f89bad
Change to implementation following pytorch frontend
2026-01-15 10:05:41 -08:00
yumengbo
e95f29cbc0
Fix issue for output memory copy of infer request
2026-01-15 10:05:41 -08:00
zhanmyz
8c5a609f8d
add the rms_norm operator implemented using OpenVINO to the GGML backend of llama.cpp
2026-01-15 10:05:41 -08:00
zhanmyz
80c330a469
Update build.md and add operation mapping(GGML to OpenVINO)
2026-01-15 10:05:41 -08:00
zhanmyz
49804f43fc
add GET_ROWS operator of OpenVINO to GGML of llama.cpp
2026-01-15 10:05:41 -08:00
yumengbo
5b46dc23be
Change output for infer request to set output tensor. Support scale, view op.
2026-01-15 10:05:41 -08:00
yumengbo
31bd816426
Add GGML_OV_FRONTEND option. Add readme.
2026-01-15 10:05:41 -08:00
yumengbo
9b7b63d12c
Convert subgraph with add, sub, mul, div op to ov model and do infer on openvino device
2026-01-15 10:05:41 -08:00
yumengbo
34e826ac14
Implement GgmlOvDecoder. Add dump functions.
2026-01-15 10:05:41 -08:00
yumengbo
171c4681f4
Add PoC of integration of openvino frontend. Main changes: ggml-ov-frontend-utils, GraphIterator, Decoder
2026-01-15 10:05:41 -08:00
zhanmyz
ee31dc1c1b
add get openvino available ops function
2026-01-15 10:05:41 -08:00
zhanmyz
77d68146a8
add OpenVINO frontend convert process steps
2026-01-15 10:05:41 -08:00
zhanmyz
0a81aa19f7
Add compile options
2026-01-15 10:05:40 -08:00
zhanmyz
adc2c70f44
Add OpenVINO MUL operator to GGML of Llama.cpp.
2026-01-15 10:05:40 -08:00
zhanmyz
faa4a7de76
Solve the issue of abnormal model output caused by using OpenVINO ADD operator
2026-01-15 10:05:40 -08:00
zhanmyz
9b9d51dddf
* Configure the device(default CPU) that uses OpenVINO to compile the model
...
* Add OpenVINO ADD operator to Llama.cpp. The output is somewhat abnormal and needs further debugging.
2026-01-15 10:05:40 -08:00
zhanmyz
5294402b50
add openvino as optional backend for Llama.cpp ggml
2026-01-15 10:05:40 -08:00
Yanglei Zou
fe5720e684
Add ggml-openvino base files
2026-01-15 10:05:40 -08:00