Commit Graph

7921 Commits

Author SHA1 Message Date
Arshath 4a57b37d4d Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath bed495226d Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 11b4cc5a67 Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 047bfb5c90 Update ggml-decoder.cpp
Hitting error while compiling on windows:

error C3861: 'unsetenv': identifier not found

Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it.

Proposed fix: Use _putenv_s() (Windows equivalent)
This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment.

This keeps cross-platform compatibility.
2026-01-15 11:38:07 -08:00
Yu, Zijun 531941b348 Fix NPU 2026-01-15 11:28:48 -08:00
Yu, Zijun ae404f7cbb Fix llama-bench 2026-01-15 11:28:48 -08:00
Yu, Zijun 072dde0b2b change graph to 4d, support multi sequences 2026-01-15 11:28:48 -08:00
Yu, Zijun ea2c99be1c NPU unify PD (handled internally) 2026-01-15 11:28:48 -08:00
Yu, Zijun 303923aba7 Clean placeholders in ggml-openvino.cpp 2026-01-15 11:27:30 -08:00
Zijun Yu b8690bc055 NPU Unify PD (#14)
* Stateless. Fix llama-cli llama-server

* Simplify broadcast op in attention

* Replace get_output_tensor+memcpy with set_output_tensor

* NPU unify PD. Unify dynamic and static dims
2026-01-15 11:27:30 -08:00
Yu, Zijun eba8113dc4 Style: middle ptr and ref align, omit optional struct keyword 2026-01-15 11:27:30 -08:00
Yu, Zijun bd3093f90c Style: use switch in supports_ops 2026-01-15 11:27:30 -08:00
Ravi Panchumarthy 3a1129e073 Update OV dockerfile to use OV2025.3 and update build docs 2026-01-15 11:27:30 -08:00
Ravi Panchumarthy 45af912b48 Update CI to run OV dep install before build 2026-01-15 11:27:30 -08:00
Ravi Panchumarthy 38e8a19f50 Apply CISC review and update CI to OV2025.3 2026-01-15 11:27:28 -08:00
Yu, Zijun 4c8406eb70 Add OV CI cache 2026-01-15 11:26:00 -08:00
Ravi Panchumarthy 841d673bd0 Update to OV-2025.3 and CMakeLists.txt 2026-01-15 11:26:00 -08:00
Yu, Zijun 2d2f00a41f Fix llama-3-8b and phi3-mini q4_0 NPU 2026-01-15 11:26:00 -08:00
Yu, Zijun 299f4923bb fix after rebasing 2026-01-15 11:26:00 -08:00
Yu, Zijun 8b82d1153b Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working 2026-01-15 11:26:00 -08:00
Yu, Zijun a9371ea646 Fix llama-cli (need to run with --no-warmup) 2026-01-15 11:26:00 -08:00
cavusmustafa 05d7abae8c Fix for Phi3 2026-01-15 11:26:00 -08:00
cavusmustafa e7252920e1 env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added 2026-01-15 11:26:00 -08:00
cavusmustafa c112bc4e73 kvcachefusion support 2026-01-15 11:26:00 -08:00
Yu, Zijun 973a80fd02 Always apply Eliminate_ZP to fix GPU compile issue on some platforms 2026-01-15 11:26:00 -08:00
Yu, Zijun fdadca1e89 Fix after rebasing 2026-01-15 11:26:00 -08:00
Yu, Zijun f3afa7b914 Requantize Q6_K (gs16) to gs32 on GPU 2026-01-15 11:26:00 -08:00
Yu, Zijun e4bfe5a20d Add Q5_K to support phi-3-q4_k_m 2026-01-15 11:26:00 -08:00
Yu, Zijun 2f1d50fb07 Minor refactor 2026-01-15 11:26:00 -08:00
Yu, Zijun 67e178a2f6 Minor: not add attention_size_swa for non-swa model 2026-01-15 11:26:00 -08:00
Yu, Zijun 1a38339cea Fix ROPE accuracy when freq_scale != 1 2026-01-15 11:26:00 -08:00
Yu, Zijun 602f9ca4af Fix NPU accuracy 2026-01-15 11:26:00 -08:00
Yu, Zijun 9de874cb7b Support iSWA 2026-01-15 11:25:58 -08:00
Yu, Zijun 7d81861a18 Fix Hunyuan 2026-01-15 11:20:31 -08:00
Yu, Zijun 597561242f Add GeGLU 2026-01-15 11:20:31 -08:00
Yu, Zijun be07073e0e Apply EliminateZP only for npu 2026-01-15 11:20:31 -08:00
Yu, Zijun da2cc993bc WA for npu 1st token acc issue 2026-01-15 11:20:31 -08:00
Yu, Zijun 434059aef7 Fix NPU compile 2026-01-15 11:20:31 -08:00
Yu, Zijun bcc343af00 Support BF16 model 2026-01-15 11:20:31 -08:00
Yu, Zijun dc77cbb3f6 STYLE: make get_types_to_requant a function 2026-01-15 11:20:31 -08:00
Yu, Zijun 2ad1147b9b Improve debug util; Eliminate nop ReshapeReshape 2026-01-15 11:20:31 -08:00
Yu, Zijun 0f7b253cb3 Fix after rebasing 2026-01-15 11:20:31 -08:00
Yu, Zijun 810eb480f5 Simpilfy translation of get_rows 2026-01-15 11:20:31 -08:00
Yu, Zijun c5231a2448 Set m_is_static=false as default in decoder 2026-01-15 11:20:31 -08:00
Yu, Zijun 6926655f5b Add custom quant type: q8_1_c, q4_0_128 2026-01-15 11:20:31 -08:00
Yu, Zijun b593428eb3 Dequantize q4_1 q4_k q6_k for NPU 2026-01-15 11:20:31 -08:00
Yu, Zijun 82c98335d3 NPU perf: eliminate zp 2026-01-15 11:20:31 -08:00
Yu, Zijun 9ca53c7991 Add NPU Q4_0 support 2026-01-15 11:20:31 -08:00
Yu, Zijun 9900245e0b Fix test-backend-ops: Treat quantized tensors as weights 2026-01-15 11:20:31 -08:00
Yu, Zijun a1ce428004 Fix Q4_1 2026-01-15 11:19:15 -08:00