Yu, Zijun
d757849741
Put kvcache on GPU
2026-01-15 11:39:08 -08:00
Yu, Zijun
3fdcb6ab72
Add ov_backend_host_buffer; Use cached remote context
2026-01-15 11:39:08 -08:00
Yu, Zijun
72bba828df
Use shared_buffer for GPU NPU; Refactor
2026-01-15 11:39:08 -08:00
Yu, Zijun
22d9c17a6f
backend buffer: allocate on host
2026-01-15 11:39:08 -08:00
Arshath
ae5336386f
Update build.md for Windows
2026-01-15 11:39:08 -08:00
Yu, Zijun
0ef2e5e4d4
Fix decoder can_reuse for llama-bench
2026-01-15 11:39:08 -08:00
Xuejun Zhai
9e3163e846
Remove unused variable nodes
2026-01-15 11:39:08 -08:00
Yu, Zijun
c9234b44cc
NPU fix q4 perf regression
2026-01-15 11:39:08 -08:00
Yu, Zijun
ae01322dbd
NPU fix wrong model output shape
2026-01-15 11:39:08 -08:00
Yu, Zijun
469325c6da
GPU remove Q6_K requantization
2026-01-15 11:39:08 -08:00
Yu, Zijun
28da9a9adc
Reuse cached decoder
2026-01-15 11:39:08 -08:00
Xuejun Zhai
91a1b20c82
Fix error for decoder cache
2026-01-15 11:39:08 -08:00
Xuejun Zhai
47c91db31f
Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
acb8a01d0e
Removed API GgmlOvDecoder::get_input_shape(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
42ca27f714
Removed API get_input_type
2026-01-15 11:39:08 -08:00
Xuejun Zhai
891a3beb2d
Removed API get_input_type
2026-01-15 11:39:08 -08:00
Xuejun Zhai
cd611782ef
Removed API GgmlOvDecoder::get_input_stride(const std::string& name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
95c3071906
Removed API GgmlOvDecoder::get_input_names()
2026-01-15 11:39:08 -08:00
Xuejun Zhai
197ed992c0
Removed m_output_names
2026-01-15 11:39:08 -08:00
Xuejun Zhai
8ff73e5d53
Removed API m_outputs
2026-01-15 11:39:08 -08:00
Xuejun Zhai
111c96c266
Removed API get_output_ggml_tensor(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
ba852f2a60
Removed API GgmlOvDecoder::get_output_op_params(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
6d7a0d6047
Modified API GgmlOvDecoder::get_output_type(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
f516db1db5
remove unused API get_output_shape(const std::string & name)
2026-01-15 11:39:08 -08:00
Xuejun Zhai
497964afbb
remove unused API GgmlOvDecoder::get_output_names()
2026-01-15 11:39:08 -08:00
Yu, Zijun
8f4ee4eee2
minor update due to ov 2025.4
2026-01-15 11:39:08 -08:00
Xuejun Zhai
0ea8238ad0
remove unused API GgmlOvDecoder::get_output_stride(const std::string & name)
2026-01-15 11:39:08 -08:00
Yu, Zijun
2a9d4ca836
Refactor: split ov_graph_compute for dynamic and static
2026-01-15 11:39:08 -08:00
Yu, Zijun
808619e274
NPU support llma-perplexity -b 512 --no-warmup
2026-01-15 11:39:08 -08:00
Yu, Zijun
65348b5d20
fallback naive run with accuracy issue
2026-01-15 11:39:08 -08:00
Yu, Zijun
59e7e7c47d
NPU fix llama-bench
2026-01-15 11:39:08 -08:00
Yu, Zijun
38254cf592
NPU prefill chunking
2026-01-15 11:39:08 -08:00
XuejunZhai
992dea73fd
Fix error for naive
2026-01-15 11:39:08 -08:00
XuejunZhai
ae936519d2
Remove the second decoder for node. Moving the function into the model decoder
2026-01-15 11:39:05 -08:00
Arshath
4400b5cb4b
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
98396b275a
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
4a57b37d4d
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
bed495226d
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
11b4cc5a67
Update ggml-decoder.cpp
2026-01-15 11:38:13 -08:00
Arshath
047bfb5c90
Update ggml-decoder.cpp
...
Hitting error while compiling on windows:
error C3861: 'unsetenv': identifier not found
Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it.
Proposed fix: Use _putenv_s() (Windows equivalent)
This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment.
This keeps cross-platform compatibility.
2026-01-15 11:38:07 -08:00
Yu, Zijun
531941b348
Fix NPU
2026-01-15 11:28:48 -08:00
Yu, Zijun
ae404f7cbb
Fix llama-bench
2026-01-15 11:28:48 -08:00
Yu, Zijun
072dde0b2b
change graph to 4d, support multi sequences
2026-01-15 11:28:48 -08:00
Yu, Zijun
ea2c99be1c
NPU unify PD (handled internally)
2026-01-15 11:28:48 -08:00
Yu, Zijun
303923aba7
Clean placeholders in ggml-openvino.cpp
2026-01-15 11:27:30 -08:00
Zijun Yu
b8690bc055
NPU Unify PD ( #14 )
...
* Stateless. Fix llama-cli llama-server
* Simplify broadcast op in attention
* Replace get_output_tensor+memcpy with set_output_tensor
* NPU unify PD. Unify dynamic and static dims
2026-01-15 11:27:30 -08:00
Yu, Zijun
eba8113dc4
Style: middle ptr and ref align, omit optional struct keyword
2026-01-15 11:27:30 -08:00
Yu, Zijun
bd3093f90c
Style: use switch in supports_ops
2026-01-15 11:27:30 -08:00
Ravi Panchumarthy
3a1129e073
Update OV dockerfile to use OV2025.3 and update build docs
2026-01-15 11:27:30 -08:00
Ravi Panchumarthy
45af912b48
Update CI to run OV dep install before build
2026-01-15 11:27:30 -08:00