Commit Graph

7961 Commits

Author SHA1 Message Date
Yu, Zijun cfc471353d FIX: use remote tensor from singleton 2026-01-15 11:39:08 -08:00
Yu, Zijun a356b44477 only use remote tensor for kvcache for GPU 2026-01-15 11:39:08 -08:00
Yu, Zijun 88d1d17eac only use remote tensor for kvcache 2026-01-15 11:39:08 -08:00
Yu, Zijun 8273a7c2f4 Use ggml_aligned_malloc 2026-01-15 11:39:08 -08:00
Yu, Zijun d757849741 Put kvcache on GPU 2026-01-15 11:39:08 -08:00
Yu, Zijun 3fdcb6ab72 Add ov_backend_host_buffer; Use cached remote context 2026-01-15 11:39:08 -08:00
Yu, Zijun 72bba828df Use shared_buffer for GPU NPU; Refactor 2026-01-15 11:39:08 -08:00
Yu, Zijun 22d9c17a6f backend buffer: allocate on host 2026-01-15 11:39:08 -08:00
Arshath ae5336386f Update build.md for Windows 2026-01-15 11:39:08 -08:00
Yu, Zijun 0ef2e5e4d4 Fix decoder can_reuse for llama-bench 2026-01-15 11:39:08 -08:00
Xuejun Zhai 9e3163e846 Remove unused variable nodes 2026-01-15 11:39:08 -08:00
Yu, Zijun c9234b44cc NPU fix q4 perf regression 2026-01-15 11:39:08 -08:00
Yu, Zijun ae01322dbd NPU fix wrong model output shape 2026-01-15 11:39:08 -08:00
Yu, Zijun 469325c6da GPU remove Q6_K requantization 2026-01-15 11:39:08 -08:00
Yu, Zijun 28da9a9adc Reuse cached decoder 2026-01-15 11:39:08 -08:00
Xuejun Zhai 91a1b20c82 Fix error for decoder cache 2026-01-15 11:39:08 -08:00
Xuejun Zhai 47c91db31f Removed API GgmlOvDecoder::get_input_op_params(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai acb8a01d0e Removed API GgmlOvDecoder::get_input_shape(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 42ca27f714 Removed API get_input_type 2026-01-15 11:39:08 -08:00
Xuejun Zhai 891a3beb2d Removed API get_input_type 2026-01-15 11:39:08 -08:00
Xuejun Zhai cd611782ef Removed API GgmlOvDecoder::get_input_stride(const std::string& name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 95c3071906 Removed API GgmlOvDecoder::get_input_names() 2026-01-15 11:39:08 -08:00
Xuejun Zhai 197ed992c0 Removed m_output_names 2026-01-15 11:39:08 -08:00
Xuejun Zhai 8ff73e5d53 Removed API m_outputs 2026-01-15 11:39:08 -08:00
Xuejun Zhai 111c96c266 Removed API get_output_ggml_tensor(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai ba852f2a60 Removed API GgmlOvDecoder::get_output_op_params(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 6d7a0d6047 Modified API GgmlOvDecoder::get_output_type(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai f516db1db5 remove unused API get_output_shape(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 497964afbb remove unused API GgmlOvDecoder::get_output_names() 2026-01-15 11:39:08 -08:00
Yu, Zijun 8f4ee4eee2 minor update due to ov 2025.4 2026-01-15 11:39:08 -08:00
Xuejun Zhai 0ea8238ad0 remove unused API GgmlOvDecoder::get_output_stride(const std::string & name) 2026-01-15 11:39:08 -08:00
Yu, Zijun 2a9d4ca836 Refactor: split ov_graph_compute for dynamic and static 2026-01-15 11:39:08 -08:00
Yu, Zijun 808619e274 NPU support llma-perplexity -b 512 --no-warmup 2026-01-15 11:39:08 -08:00
Yu, Zijun 65348b5d20 fallback naive run with accuracy issue 2026-01-15 11:39:08 -08:00
Yu, Zijun 59e7e7c47d NPU fix llama-bench 2026-01-15 11:39:08 -08:00
Yu, Zijun 38254cf592 NPU prefill chunking 2026-01-15 11:39:08 -08:00
XuejunZhai 992dea73fd Fix error for naive 2026-01-15 11:39:08 -08:00
XuejunZhai ae936519d2 Remove the second decoder for node. Moving the function into the model decoder 2026-01-15 11:39:05 -08:00
Arshath 4400b5cb4b Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 98396b275a Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 4a57b37d4d Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath bed495226d Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 11b4cc5a67 Update ggml-decoder.cpp 2026-01-15 11:38:13 -08:00
Arshath 047bfb5c90 Update ggml-decoder.cpp
Hitting error while compiling on windows:

error C3861: 'unsetenv': identifier not found

Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it.

Proposed fix: Use _putenv_s() (Windows equivalent)
This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment.

This keeps cross-platform compatibility.
2026-01-15 11:38:07 -08:00
Yu, Zijun 531941b348 Fix NPU 2026-01-15 11:28:48 -08:00
Yu, Zijun ae404f7cbb Fix llama-bench 2026-01-15 11:28:48 -08:00
Yu, Zijun 072dde0b2b change graph to 4d, support multi sequences 2026-01-15 11:28:48 -08:00
Yu, Zijun ea2c99be1c NPU unify PD (handled internally) 2026-01-15 11:28:48 -08:00
Yu, Zijun 303923aba7 Clean placeholders in ggml-openvino.cpp 2026-01-15 11:27:30 -08:00
Zijun Yu b8690bc055 NPU Unify PD (#14)
* Stateless. Fix llama-cli llama-server

* Simplify broadcast op in attention

* Replace get_output_tensor+memcpy with set_output_tensor

* NPU unify PD. Unify dynamic and static dims
2026-01-15 11:27:30 -08:00