Commit Graph

2047 Commits

Author SHA1 Message Date
Cavus Mustafa 1a19566b23 add mark decomp pass 2026-01-15 10:26:28 -08:00
Cavus Mustafa 93b2d09a2d mulmat type conversion update 2026-01-15 10:26:28 -08:00
Cavus Mustafa e2fdc1b988 mulmat input conversion fix 2026-01-15 10:26:28 -08:00
Yu, Zijun 01cdf4a9cc matmul in fp32 2026-01-15 10:26:28 -08:00
Cavus Mustafa 9cf56d6837 temp. changes for mark decomp 2026-01-15 10:26:28 -08:00
Yu, Zijun 4e7f04a307 Fix llama-perplexity 2026-01-15 10:26:28 -08:00
Yu, Zijun 75eec6265f Fix llama-bench; Clang-format 2026-01-15 10:26:28 -08:00
Yu, Zijun 6dc4b90635 Fix NPU 2026-01-15 10:26:28 -08:00
Yu, Zijun 44f4cf34b1 Fix Phi3 ROPE; Add test-backend-ops 2026-01-15 10:26:28 -08:00
Yu, Zijun 1ed49bbfaf Fix llama-cli 2026-01-15 10:26:28 -08:00
Yu, Zijun d61f83c9b7 Fix CPY due to cgraph change 2026-01-15 10:23:35 -08:00
Yu, Zijun f3c0519096 Reduce memory: free ov weights node after graph conversion 2026-01-15 10:20:18 -08:00
Yu, Zijun a80da69448 Pull out sin cos from rope 2026-01-15 10:20:18 -08:00
Yu, Zijun 3533c14cf6 Fix Phi3 SwiGLU and SoftMax 2026-01-15 10:20:18 -08:00
Yu, Zijun 0fa7a5efef Refactor: remove past_token_len from extra_inputs 2026-01-15 10:20:18 -08:00
Yu, Zijun acf358d1ce Pull out indices creation for kv cache update 2026-01-15 10:20:18 -08:00
Yu, Zijun bf5414c95e Replace Concat with Broadcast in MulMat for GQA 2026-01-15 10:20:18 -08:00
Yu, Zijun ebc4fc9f95 Fuse to SDPA 2026-01-15 10:20:18 -08:00
Yu, Zijun 73ee84fffe Add SwiGLU 2026-01-15 10:20:18 -08:00
Yu, Zijun 4c582ac7a3 Statful transformation for CPU GPU 2026-01-15 10:20:18 -08:00
Yu, Zijun 8afee795ad Update clang-format 2026-01-15 10:20:18 -08:00
Yu, Zijun 593484ce5f Refactor: clean, fix warning 2026-01-15 10:20:18 -08:00
Yu, Zijun 42d4240937 Change due to ggml cgraph changes, all device work 2026-01-15 10:20:18 -08:00
Yu, Zijun e27738a987 Add AMD64 to CMakeLists 2026-01-15 10:20:18 -08:00
Yu, Zijun 592d7f8bbb Change due to ggml cgraph changes, llama-3.2 CPU work 2026-01-15 10:20:18 -08:00
Yu, Zijun f7ad77930e Change due to ggml cgraph changes, not correct yet 2026-01-15 10:20:18 -08:00
Yu, Zijun d9ca8f5dbe NPU support version 2: prefill + kvcache 2026-01-15 10:20:18 -08:00
Yu, Zijun 34531abce4 draft NPU support version 2: prefill + kvcache 2026-01-15 10:20:18 -08:00
Yu, Zijun 7fec223334 Add initial NPU support 2026-01-15 10:20:18 -08:00
Yu, Zijun 8ce5cc597a Add cgraph tensor output name to OV op name 2026-01-15 10:20:18 -08:00
Yu, Zijun d7cc802292 PERF: use Slice+Concat in writing cache_v 2026-01-15 10:20:18 -08:00
Yu, Zijun 8ac5c225aa FIX: set_max_token_len 2026-01-15 10:20:18 -08:00
Yu, Zijun a30dc6e726 PERF: add weight constant in parallel 2026-01-15 10:20:18 -08:00
Yu, Zijun c57f61494a FIX: input shape of KQ_mask 2026-01-15 10:20:18 -08:00
Yu, Zijun 041d220dfa FIX: Re-add tensor names in cgraph, Add another case for RESHAPE 2026-01-15 10:20:13 -08:00
Yu, Zijun 0d505b4e56 STYLE and minor REFACTOR 2026-01-15 10:10:00 -08:00
Yu, Zijun cdf5370cb5 PERF: favor low precision matmul 2026-01-15 10:10:00 -08:00
Yu, Zijun 0d009fe61a FEAT: Add all conversion code from ov side 2026-01-15 10:10:00 -08:00
Yu, Zijun f15a2cc057 STYLE: clang-format 2026-01-15 10:10:00 -08:00
Yu, Zijun a0b30529bf FIX: backend buffer type issue 2026-01-15 10:10:00 -08:00
Zijun Yu 4c905b2b25 fix build error 2026-01-15 10:10:00 -08:00
Viraj Wadhwa ffabe95e2a Rebase - Bring up to date and fix build process 2026-01-15 10:09:23 -08:00
Yu, Zijun a8e5efa44e PERF: compile once (dynamic graph + cache) 2026-01-15 10:05:41 -08:00
Yu, Zijun 7d5e234254 FEAT: improve debug capability 2026-01-15 10:05:41 -08:00
Yu, Zijun 0a8cc9ab03 BUILD: update build doc, add cmake preset, add CACHE_DIR env var 2026-01-15 10:05:41 -08:00
Yu, Zijun d3bdca25bd PERF: share const nodes for weights for diff infer 2026-01-15 10:05:41 -08:00
Yu, Zijun 96ba47dd43 STYLE: minor refactor 2026-01-15 10:05:41 -08:00
Yu, Zijun c04966cda6 REFACTOR: support weigts as constant 2026-01-15 10:05:41 -08:00
Yu, Zijun 0c7b026ecc FEAT: Add interleaved mode for ROPE 2026-01-15 10:05:41 -08:00
Yu, Zijun 6ed44a3dff FEAT: do PERMUTE eagerly 2026-01-15 10:05:41 -08:00
Yu, Zijun 8b408869ae Arbitrary token len (>32) work; Fix bug in mulmat 2026-01-15 10:05:41 -08:00
Yu, Zijun 8d263bd6a5 2nd+ token correct by fix CPY in OV, remove single op backend compute code 2026-01-15 10:05:41 -08:00
Yu, Zijun 91d2a195b5 change op mappings to list in openvino_supports_op 2026-01-15 10:05:41 -08:00
Yu, Zijun 651b2c06cb * Use find_package in CMake to configure OpenVINO
* Remove OPENVINO_OP_DEBUG
* Simplify set_input_output in decoder
* Fix CPY in set_input_output
* Use params from converted ov model in setting input
2026-01-15 10:05:41 -08:00
zhanmyz 84be5c6f15 1. Delete some comments
2. Process Prompt and predict first token is OK
2026-01-15 10:05:41 -08:00
zhanmyz eac9a99530 1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase of “1. Process Prompt and predict the first token”.
2. There is still an AC issue in the "2. Predict the subsequent tokens phase" and it is being debugged.
   A deviation has been detected in the computation of OpenVINO's CPY Node at stage 2, and it is currently being fixed.
2026-01-15 10:05:41 -08:00
zhanmyz 8ae700ae11 Process Prompt and predict first token is OK 2026-01-15 10:05:41 -08:00
zhanmyz 8020138406 add debug info 2026-01-15 10:05:41 -08:00
zhanmyz b02265a507 1. In the Prompt process and predict first token stage, the PERMUTE node needs to be integrated into the OV Frontend
2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz 19ec9b6bf5 Try to add VIEW node to OV Frontend and have some issues that need to be dealt with 2026-01-15 10:05:41 -08:00
zhanmyz b14b49d5f6 Minor Update 2026-01-15 10:05:41 -08:00
zhanmyz 467a5ddf04 1. Update the implementation of CPY node when it's non-contiguous
2. Remove duplicate get node operation function
2026-01-15 10:05:41 -08:00
zhanmyz cff473a9e2 1. All operators implemented using OpenVINO can be successfully executed individually.
2. VIEW op output tensor shape is not same with CONT(non-contiguous) input tensor shape
3. CPY(non-contiguous) can't be implemented with original input/output tensor shape and data(need change the original shape when create input/output tensor)

Currently. VIEW op executed in the ggml backend and others executed in the OpenVINO Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz e08a7fda33 All adjacent ops can conversion but calculation result is wrong and need debugging 2026-01-15 10:05:41 -08:00
zhanmyz d05c458421 change CONT and MULMAT input node shape 2026-01-15 10:05:41 -08:00
zhanmyz 246a2d1021 Change the input and ouput node shape of MUL_MAT operator 2026-01-15 10:05:41 -08:00
zhanmyz f37fa21a5c Change the input and ouput node shape of MUL_MAT operator 2026-01-15 10:05:41 -08:00
zhanmyz f98d215162 Change the input parameter shape of CONT operator 2026-01-15 10:05:41 -08:00
zhanmyz 9a7b7d8d6d OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX/ADD adjacent op graph conversion 2026-01-15 10:05:41 -08:00
zhanmyz 95ae982d59 OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion of consecutive OPs 2026-01-15 10:05:41 -08:00
zhanmyz 901f7347ff Execute CONT & VIEW operators in OV Frontend is OK 2026-01-15 10:05:41 -08:00
zhanmyz 081b52667b Execute singel CONT operator is OK 2026-01-15 10:05:41 -08:00
zhanmyz afb8594194 add tmp source code files 2026-01-15 10:05:41 -08:00
zhanmyz 57582fda39 add implementation of CPY when the output tensor is non-contiguous 2026-01-15 10:05:41 -08:00
zhanmyz 8484769981 add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops 2026-01-15 10:05:41 -08:00
zhanmyz cb2729bc4a Move CPY from GGML OV Backend to OV Frontend 2026-01-15 10:05:41 -08:00
zhanmyz 2b04bd43be Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML backend 2026-01-15 10:05:41 -08:00
zhanmyz 0f7d07de7d Add support for RMS_NORM OP 2026-01-15 10:05:41 -08:00
yumengbo 2353c73f53 Support ROPE op. 2026-01-15 10:05:41 -08:00
yumengbo 8aba03bac6 Support Softmax op 2026-01-15 10:05:41 -08:00
yumengbo d218c61e6d Support Softmax op 2026-01-15 10:05:41 -08:00
yumengbo 590f587b27 Add support for UNARY SILU op . Fix pytorch impl bugs. 2026-01-15 10:05:41 -08:00
yumengbo b100f89bad Change to implementation following pytorch frontend 2026-01-15 10:05:41 -08:00
yumengbo e95f29cbc0 Fix issue for output memory copy of infer request 2026-01-15 10:05:41 -08:00
zhanmyz 8c5a609f8d add the rms_norm operator implemented using OpenVINO to the GGML backend of llama.cpp 2026-01-15 10:05:41 -08:00
zhanmyz 80c330a469 Update build.md and add operation mapping(GGML to OpenVINO) 2026-01-15 10:05:41 -08:00
zhanmyz 49804f43fc add GET_ROWS operator of OpenVINO to GGML of llama.cpp 2026-01-15 10:05:41 -08:00
yumengbo 5b46dc23be Change output for infer request to set output tensor. Support scale, view op. 2026-01-15 10:05:41 -08:00
yumengbo 31bd816426 Add GGML_OV_FRONTEND option. Add readme. 2026-01-15 10:05:41 -08:00
yumengbo 9b7b63d12c Convert subgraph with add, sub, mul, div op to ov model and do infer on openvino device 2026-01-15 10:05:41 -08:00
yumengbo 34e826ac14 Implement GgmlOvDecoder. Add dump functions. 2026-01-15 10:05:41 -08:00
yumengbo 171c4681f4 Add PoC of integration of openvino frontend. Main changes: ggml-ov-frontend-utils, GraphIterator, Decoder 2026-01-15 10:05:41 -08:00
zhanmyz ee31dc1c1b add get openvino available ops function 2026-01-15 10:05:41 -08:00
zhanmyz 77d68146a8 add OpenVINO frontend convert process steps 2026-01-15 10:05:41 -08:00
zhanmyz 0a81aa19f7 Add compile options 2026-01-15 10:05:40 -08:00
zhanmyz adc2c70f44 Add OpenVINO MUL operator to GGML of Llama.cpp. 2026-01-15 10:05:40 -08:00
zhanmyz faa4a7de76 Solve the issue of abnormal model output caused by using OpenVINO ADD operator 2026-01-15 10:05:40 -08:00
zhanmyz 9b9d51dddf * Configure the device(default CPU) that uses OpenVINO to compile the model
* Add OpenVINO ADD operator to Llama.cpp. The output is somewhat abnormal and needs further debugging.
2026-01-15 10:05:40 -08:00
zhanmyz 5294402b50 add openvino as optional backend for Llama.cpp ggml 2026-01-15 10:05:40 -08:00
Yanglei Zou fe5720e684 Add ggml-openvino base files 2026-01-15 10:05:40 -08:00