Yamini Nimmagadda
|
63eed0d9f3
|
Update build.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
61552e4450
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
9ba324726a
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
25e652569b
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
416556a87d
|
Create OPENVINO.md in llama.cpp backend docs
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
599335c633
|
Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
a92eceecd9
|
Update ggml/src/ggml-openvino/ggml-decoder.cpp
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
a81b202f57
|
requant to f16 for Q6 embed on NPU
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
a40a5dfc60
|
npu perf fix
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
981ec6571d
|
code cleanup
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
d2fc15226b
|
Update ggml/src/ggml-openvino/ggml-decoder.cpp
Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
5f30eacdb4
|
Initial stateful graph support
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
0d6f253e48
|
Support -ctk f32
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
f5c71e3cf4
|
Update build.md
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
4e451778d3
|
Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
67c9720e49
|
Optimize symmetric quant weight extraction: use single zp
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
c1142ddb7c
|
NPU always requant to q4_0_128
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
52a44012c0
|
Update build.md to include OpenCL
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
cfc471353d
|
FIX: use remote tensor from singleton
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
a356b44477
|
only use remote tensor for kvcache for GPU
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
88d1d17eac
|
only use remote tensor for kvcache
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
8273a7c2f4
|
Use ggml_aligned_malloc
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
d757849741
|
Put kvcache on GPU
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
3fdcb6ab72
|
Add ov_backend_host_buffer; Use cached remote context
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
72bba828df
|
Use shared_buffer for GPU NPU; Refactor
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
22d9c17a6f
|
backend buffer: allocate on host
|
2026-01-15 11:39:08 -08:00 |
Arshath
|
ae5336386f
|
Update build.md for Windows
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
0ef2e5e4d4
|
Fix decoder can_reuse for llama-bench
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
9e3163e846
|
Remove unused variable nodes
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
c9234b44cc
|
NPU fix q4 perf regression
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
ae01322dbd
|
NPU fix wrong model output shape
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
469325c6da
|
GPU remove Q6_K requantization
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
28da9a9adc
|
Reuse cached decoder
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
91a1b20c82
|
Fix error for decoder cache
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
47c91db31f
|
Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
acb8a01d0e
|
Removed API GgmlOvDecoder::get_input_shape(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
42ca27f714
|
Removed API get_input_type
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
891a3beb2d
|
Removed API get_input_type
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
cd611782ef
|
Removed API GgmlOvDecoder::get_input_stride(const std::string& name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
95c3071906
|
Removed API GgmlOvDecoder::get_input_names()
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
197ed992c0
|
Removed m_output_names
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
8ff73e5d53
|
Removed API m_outputs
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
111c96c266
|
Removed API get_output_ggml_tensor(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
ba852f2a60
|
Removed API GgmlOvDecoder::get_output_op_params(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
6d7a0d6047
|
Modified API GgmlOvDecoder::get_output_type(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
f516db1db5
|
remove unused API get_output_shape(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
497964afbb
|
remove unused API GgmlOvDecoder::get_output_names()
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
8f4ee4eee2
|
minor update due to ov 2025.4
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
0ea8238ad0
|
remove unused API GgmlOvDecoder::get_output_stride(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
2a9d4ca836
|
Refactor: split ov_graph_compute for dynamic and static
|
2026-01-15 11:39:08 -08:00 |