Commit Graph

7979 Commits

Author SHA1 Message Date
Yamini Nimmagadda 63eed0d9f3 Update build.md 2026-01-15 11:39:08 -08:00
Yamini Nimmagadda 61552e4450 Update OPENVINO.md 2026-01-15 11:39:08 -08:00
Yamini Nimmagadda 9ba324726a Update OPENVINO.md 2026-01-15 11:39:08 -08:00
Yamini Nimmagadda 25e652569b Update OPENVINO.md 2026-01-15 11:39:08 -08:00
Yamini Nimmagadda 416556a87d Create OPENVINO.md in llama.cpp backend docs 2026-01-15 11:39:08 -08:00
Mustafa Cavus 599335c633 Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp 2026-01-15 11:39:08 -08:00
Mustafa Cavus a92eceecd9 Update ggml/src/ggml-openvino/ggml-decoder.cpp 2026-01-15 11:39:08 -08:00
Mustafa Cavus a81b202f57 requant to f16 for Q6 embed on NPU 2026-01-15 11:39:08 -08:00
Mustafa Cavus a40a5dfc60 npu perf fix 2026-01-15 11:39:08 -08:00
Mustafa Cavus 981ec6571d code cleanup 2026-01-15 11:39:08 -08:00
Mustafa Cavus d2fc15226b Update ggml/src/ggml-openvino/ggml-decoder.cpp
Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>
2026-01-15 11:39:08 -08:00
Mustafa Cavus 5f30eacdb4 Initial stateful graph support 2026-01-15 11:39:08 -08:00
Yu, Zijun 0d6f253e48 Support -ctk f32 2026-01-15 11:39:08 -08:00
Yu, Zijun f5c71e3cf4 Update build.md 2026-01-15 11:39:08 -08:00
Yu, Zijun 4e451778d3 Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant 2026-01-15 11:39:08 -08:00
Yu, Zijun 67c9720e49 Optimize symmetric quant weight extraction: use single zp 2026-01-15 11:39:08 -08:00
Yu, Zijun c1142ddb7c NPU always requant to q4_0_128 2026-01-15 11:39:08 -08:00
Yu, Zijun 52a44012c0 Update build.md to include OpenCL 2026-01-15 11:39:08 -08:00
Yu, Zijun cfc471353d FIX: use remote tensor from singleton 2026-01-15 11:39:08 -08:00
Yu, Zijun a356b44477 only use remote tensor for kvcache for GPU 2026-01-15 11:39:08 -08:00
Yu, Zijun 88d1d17eac only use remote tensor for kvcache 2026-01-15 11:39:08 -08:00
Yu, Zijun 8273a7c2f4 Use ggml_aligned_malloc 2026-01-15 11:39:08 -08:00
Yu, Zijun d757849741 Put kvcache on GPU 2026-01-15 11:39:08 -08:00
Yu, Zijun 3fdcb6ab72 Add ov_backend_host_buffer; Use cached remote context 2026-01-15 11:39:08 -08:00
Yu, Zijun 72bba828df Use shared_buffer for GPU NPU; Refactor 2026-01-15 11:39:08 -08:00
Yu, Zijun 22d9c17a6f backend buffer: allocate on host 2026-01-15 11:39:08 -08:00
Arshath ae5336386f Update build.md for Windows 2026-01-15 11:39:08 -08:00
Yu, Zijun 0ef2e5e4d4 Fix decoder can_reuse for llama-bench 2026-01-15 11:39:08 -08:00
Xuejun Zhai 9e3163e846 Remove unused variable nodes 2026-01-15 11:39:08 -08:00
Yu, Zijun c9234b44cc NPU fix q4 perf regression 2026-01-15 11:39:08 -08:00
Yu, Zijun ae01322dbd NPU fix wrong model output shape 2026-01-15 11:39:08 -08:00
Yu, Zijun 469325c6da GPU remove Q6_K requantization 2026-01-15 11:39:08 -08:00
Yu, Zijun 28da9a9adc Reuse cached decoder 2026-01-15 11:39:08 -08:00
Xuejun Zhai 91a1b20c82 Fix error for decoder cache 2026-01-15 11:39:08 -08:00
Xuejun Zhai 47c91db31f Removed API GgmlOvDecoder::get_input_op_params(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai acb8a01d0e Removed API GgmlOvDecoder::get_input_shape(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 42ca27f714 Removed API get_input_type 2026-01-15 11:39:08 -08:00
Xuejun Zhai 891a3beb2d Removed API get_input_type 2026-01-15 11:39:08 -08:00
Xuejun Zhai cd611782ef Removed API GgmlOvDecoder::get_input_stride(const std::string& name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 95c3071906 Removed API GgmlOvDecoder::get_input_names() 2026-01-15 11:39:08 -08:00
Xuejun Zhai 197ed992c0 Removed m_output_names 2026-01-15 11:39:08 -08:00
Xuejun Zhai 8ff73e5d53 Removed API m_outputs 2026-01-15 11:39:08 -08:00
Xuejun Zhai 111c96c266 Removed API get_output_ggml_tensor(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai ba852f2a60 Removed API GgmlOvDecoder::get_output_op_params(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 6d7a0d6047 Modified API GgmlOvDecoder::get_output_type(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai f516db1db5 remove unused API get_output_shape(const std::string & name) 2026-01-15 11:39:08 -08:00
Xuejun Zhai 497964afbb remove unused API GgmlOvDecoder::get_output_names() 2026-01-15 11:39:08 -08:00
Yu, Zijun 8f4ee4eee2 minor update due to ov 2025.4 2026-01-15 11:39:08 -08:00
Xuejun Zhai 0ea8238ad0 remove unused API GgmlOvDecoder::get_output_stride(const std::string & name) 2026-01-15 11:39:08 -08:00
Yu, Zijun 2a9d4ca836 Refactor: split ov_graph_compute for dynamic and static 2026-01-15 11:39:08 -08:00