Yu, Zijun
|
8fb20b28b7
|
Fix llama-bench -p -n where p<=256
|
2026-02-11 10:15:09 +08:00 |
Yu, Zijun
|
9a15c8b0cf
|
Change ov backend buffer is_host to false
|
2026-01-21 15:23:12 +08:00 |
Mustafa Cavus
|
aa4bc90030
|
Syntax correction for workflows build file
|
2026-01-16 13:06:43 -08:00 |
Mustafa Cavus
|
d7dccf887b
|
kq_mask naming fix
|
2026-01-15 14:38:53 -08:00 |
Yamini Nimmagadda
|
d3649c11cb
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
e9ed5c4cb6
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
f44c60e995
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
63eed0d9f3
|
Update build.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
61552e4450
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
9ba324726a
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
25e652569b
|
Update OPENVINO.md
|
2026-01-15 11:39:08 -08:00 |
Yamini Nimmagadda
|
416556a87d
|
Create OPENVINO.md in llama.cpp backend docs
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
599335c633
|
Update ggml/src/ggml-openvino/ggml-openvino-extra.cpp
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
a92eceecd9
|
Update ggml/src/ggml-openvino/ggml-decoder.cpp
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
a81b202f57
|
requant to f16 for Q6 embed on NPU
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
a40a5dfc60
|
npu perf fix
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
981ec6571d
|
code cleanup
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
d2fc15226b
|
Update ggml/src/ggml-openvino/ggml-decoder.cpp
Co-authored-by: Yamini Nimmagadda <yamini.nimmagadda@intel.com>
|
2026-01-15 11:39:08 -08:00 |
Mustafa Cavus
|
5f30eacdb4
|
Initial stateful graph support
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
0d6f253e48
|
Support -ctk f32
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
f5c71e3cf4
|
Update build.md
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
4e451778d3
|
Use Q8_0_C in token embd, lm_head, and for 5 and 6 bits quant
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
67c9720e49
|
Optimize symmetric quant weight extraction: use single zp
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
c1142ddb7c
|
NPU always requant to q4_0_128
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
52a44012c0
|
Update build.md to include OpenCL
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
cfc471353d
|
FIX: use remote tensor from singleton
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
a356b44477
|
only use remote tensor for kvcache for GPU
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
88d1d17eac
|
only use remote tensor for kvcache
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
8273a7c2f4
|
Use ggml_aligned_malloc
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
d757849741
|
Put kvcache on GPU
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
3fdcb6ab72
|
Add ov_backend_host_buffer; Use cached remote context
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
72bba828df
|
Use shared_buffer for GPU NPU; Refactor
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
22d9c17a6f
|
backend buffer: allocate on host
|
2026-01-15 11:39:08 -08:00 |
Arshath
|
ae5336386f
|
Update build.md for Windows
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
0ef2e5e4d4
|
Fix decoder can_reuse for llama-bench
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
9e3163e846
|
Remove unused variable nodes
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
c9234b44cc
|
NPU fix q4 perf regression
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
ae01322dbd
|
NPU fix wrong model output shape
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
469325c6da
|
GPU remove Q6_K requantization
|
2026-01-15 11:39:08 -08:00 |
Yu, Zijun
|
28da9a9adc
|
Reuse cached decoder
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
91a1b20c82
|
Fix error for decoder cache
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
47c91db31f
|
Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
acb8a01d0e
|
Removed API GgmlOvDecoder::get_input_shape(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
42ca27f714
|
Removed API get_input_type
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
891a3beb2d
|
Removed API get_input_type
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
cd611782ef
|
Removed API GgmlOvDecoder::get_input_stride(const std::string& name)
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
95c3071906
|
Removed API GgmlOvDecoder::get_input_names()
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
197ed992c0
|
Removed m_output_names
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
8ff73e5d53
|
Removed API m_outputs
|
2026-01-15 11:39:08 -08:00 |
Xuejun Zhai
|
111c96c266
|
Removed API get_output_ggml_tensor(const std::string & name)
|
2026-01-15 11:39:08 -08:00 |