Commit Graph

7938 Commits

Author SHA1 Message Date
Xuejun Zhai 8ff73e5d53 Removed API m_outputs 2026-01-15 11:39:08 -08:00
Xuejun Zhai 111c96c266 Removed API get_output_ggml_tensor(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai ba852f2a60 Removed API GgmlOvDecoder::get_output_op_params(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 6d7a0d6047 Modified API GgmlOvDecoder::get_output_type(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai f516db1db5 remove unused API get_output_shape(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 497964afbb remove unused API GgmlOvDecoder::get_output_names() 2026-01-15 11:39:08 -08:00
Yu, Zijun 8f4ee4eee2 minor update due to ov 2025.4 2026-01-15 11:39:08 -08:00
Xuejun Zhai 0ea8238ad0 remove unused API GgmlOvDecoder::get_output_stride(const std::string & name) 2026-01-15 11:39:08 -08:00
Yu, Zijun 2a9d4ca836 Refactor: split ov_graph_compute for dynamic and static 2026-01-15 11:39:08 -08:00
Yu, Zijun 808619e274 NPU support llma-perplexity -b 512 --no-warmup 2026-01-15 11:39:08 -08:00
Yu, Zijun 65348b5d20 fallback naive run with accuracy issue 2026-01-15 11:39:08 -08:00
Yu, Zijun 59e7e7c47d NPU fix llama-bench 2026-01-15 11:39:08 -08:00
Yu, Zijun 38254cf592 NPU prefill chunking 2026-01-15 11:39:08 -08:00
XuejunZhai 992dea73fd Fix error for naive 2026-01-15 11:39:08 -08:00
XuejunZhai ae936519d2 Remove the second decoder for node. Moving the function into the model decoder 2026-01-15 11:39:05 -08:00
Arshath 4400b5cb4b Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 98396b275a Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 4a57b37d4d Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath bed495226d Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 11b4cc5a67 Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 047bfb5c90 Update ggml-decoder.cpp
Hitting error while compiling on windows:

error C3861: 'unsetenv': identifier not found

Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it.

Proposed fix: Use _putenv_s() (Windows equivalent)
This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment.

This keeps cross-platform compatibility.
2026-01-15 11:38:07 -08:00
Yu, Zijun 531941b348 Fix NPU 2026-01-15 11:28:48 -08:00
Yu, Zijun ae404f7cbb Fix llama-bench 2026-01-15 11:28:48 -08:00
Yu, Zijun 072dde0b2b change graph to 4d, support multi sequences 2026-01-15 11:28:48 -08:00
Yu, Zijun ea2c99be1c NPU unify PD (handled internally) 2026-01-15 11:28:48 -08:00
Yu, Zijun 303923aba7 Clean placeholders in ggml-openvino.cpp 2026-01-15 11:27:30 -08:00
Zijun Yu b8690bc055 NPU Unify PD (#14)
* Stateless. Fix llama-cli llama-server

* Simplify broadcast op in attention

* Replace get_output_tensor+memcpy with set_output_tensor

* NPU unify PD. Unify dynamic and static dims
2026-01-15 11:27:30 -08:00
Yu, Zijun eba8113dc4 Style: middle ptr and ref align, omit optional struct keyword 2026-01-15 11:27:30 -08:00
Yu, Zijun bd3093f90c Style: use switch in supports_ops 2026-01-15 11:27:30 -08:00
Ravi Panchumarthy 3a1129e073 Update OV dockerfile to use OV2025.3 and update build docs 2026-01-15 11:27:30 -08:00
Ravi Panchumarthy 45af912b48 Update CI to run OV dep install before build 2026-01-15 11:27:30 -08:00
Ravi Panchumarthy 38e8a19f50 Apply CISC review and update CI to OV2025.3 2026-01-15 11:27:28 -08:00
Yu, Zijun 4c8406eb70 Add OV CI cache 2026-01-15 11:26:00 -08:00
Ravi Panchumarthy 841d673bd0 Update to OV-2025.3 and CMakeLists.txt 2026-01-15 11:26:00 -08:00
Yu, Zijun 2d2f00a41f Fix llama-3-8b and phi3-mini q4_0 NPU 2026-01-15 11:26:00 -08:00
Yu, Zijun 299f4923bb fix after rebasing 2026-01-15 11:26:00 -08:00
Yu, Zijun 8b82d1153b Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working 2026-01-15 11:26:00 -08:00
Yu, Zijun a9371ea646 Fix llama-cli (need to run with --no-warmup) 2026-01-15 11:26:00 -08:00
cavusmustafa 05d7abae8c Fix for Phi3 2026-01-15 11:26:00 -08:00
cavusmustafa e7252920e1 env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added 2026-01-15 11:26:00 -08:00
cavusmustafa c112bc4e73 kvcachefusion support 2026-01-15 11:26:00 -08:00
Yu, Zijun 973a80fd02 Always apply Eliminate_ZP to fix GPU compile issue on some platforms 2026-01-15 11:26:00 -08:00
Yu, Zijun fdadca1e89 Fix after rebasing 2026-01-15 11:26:00 -08:00
Yu, Zijun f3afa7b914 Requantize Q6_K (gs16) to gs32 on GPU 2026-01-15 11:26:00 -08:00
Yu, Zijun e4bfe5a20d Add Q5_K to support phi-3-q4_k_m 2026-01-15 11:26:00 -08:00
Yu, Zijun 2f1d50fb07 Minor refactor 2026-01-15 11:26:00 -08:00
Yu, Zijun 67e178a2f6 Minor: not add attention_size_swa for non-swa model 2026-01-15 11:26:00 -08:00
Yu, Zijun 1a38339cea Fix ROPE accuracy when freq_scale != 1 2026-01-15 11:26:00 -08:00
Yu, Zijun 602f9ca4af Fix NPU accuracy 2026-01-15 11:26:00 -08:00
Yu, Zijun 9de874cb7b Support iSWA 2026-01-15 11:25:58 -08:00