llama.cpp

Commit Graph

Author	SHA1	Message	Date
Yu, Zijun	cfc471353d	FIX: use remote tensor from singleton	2026-01-15 11:39:08 -08:00
Yu, Zijun	a356b44477	only use remote tensor for kvcache for GPU	2026-01-15 11:39:08 -08:00
Yu, Zijun	88d1d17eac	only use remote tensor for kvcache	2026-01-15 11:39:08 -08:00
Yu, Zijun	8273a7c2f4	Use ggml_aligned_malloc	2026-01-15 11:39:08 -08:00
Yu, Zijun	d757849741	Put kvcache on GPU	2026-01-15 11:39:08 -08:00
Yu, Zijun	3fdcb6ab72	Add ov_backend_host_buffer; Use cached remote context	2026-01-15 11:39:08 -08:00
Yu, Zijun	72bba828df	Use shared_buffer for GPU NPU; Refactor	2026-01-15 11:39:08 -08:00
Yu, Zijun	22d9c17a6f	backend buffer: allocate on host	2026-01-15 11:39:08 -08:00
Arshath	ae5336386f	Update build.md for Windows	2026-01-15 11:39:08 -08:00
Yu, Zijun	0ef2e5e4d4	Fix decoder can_reuse for llama-bench	2026-01-15 11:39:08 -08:00
Xuejun Zhai	9e3163e846	Remove unused variable nodes	2026-01-15 11:39:08 -08:00
Yu, Zijun	c9234b44cc	NPU fix q4 perf regression	2026-01-15 11:39:08 -08:00
Yu, Zijun	ae01322dbd	NPU fix wrong model output shape	2026-01-15 11:39:08 -08:00
Yu, Zijun	469325c6da	GPU remove Q6_K requantization	2026-01-15 11:39:08 -08:00
Yu, Zijun	28da9a9adc	Reuse cached decoder	2026-01-15 11:39:08 -08:00
Xuejun Zhai	91a1b20c82	Fix error for decoder cache	2026-01-15 11:39:08 -08:00
Xuejun Zhai	47c91db31f	Removed API GgmlOvDecoder::get_input_op_params(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	acb8a01d0e	Removed API GgmlOvDecoder::get_input_shape(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	42ca27f714	Removed API get_input_type	2026-01-15 11:39:08 -08:00
Xuejun Zhai	891a3beb2d	Removed API get_input_type	2026-01-15 11:39:08 -08:00
Xuejun Zhai	cd611782ef	Removed API GgmlOvDecoder::get_input_stride(const std::string& name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	95c3071906	Removed API GgmlOvDecoder::get_input_names()	2026-01-15 11:39:08 -08:00
Xuejun Zhai	197ed992c0	Removed m_output_names	2026-01-15 11:39:08 -08:00
Xuejun Zhai	8ff73e5d53	Removed API m_outputs	2026-01-15 11:39:08 -08:00
Xuejun Zhai	111c96c266	Removed API get_output_ggml_tensor(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	ba852f2a60	Removed API GgmlOvDecoder::get_output_op_params(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	6d7a0d6047	Modified API GgmlOvDecoder::get_output_type(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	f516db1db5	remove unused API get_output_shape(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	497964afbb	remove unused API GgmlOvDecoder::get_output_names()	2026-01-15 11:39:08 -08:00
Yu, Zijun	8f4ee4eee2	minor update due to ov 2025.4	2026-01-15 11:39:08 -08:00
Xuejun Zhai	0ea8238ad0	remove unused API GgmlOvDecoder::get_output_stride(const std::string & name)	2026-01-15 11:39:08 -08:00
Yu, Zijun	2a9d4ca836	Refactor: split ov_graph_compute for dynamic and static	2026-01-15 11:39:08 -08:00
Yu, Zijun	808619e274	NPU support llma-perplexity -b 512 --no-warmup	2026-01-15 11:39:08 -08:00
Yu, Zijun	65348b5d20	fallback naive run with accuracy issue	2026-01-15 11:39:08 -08:00
Yu, Zijun	59e7e7c47d	NPU fix llama-bench	2026-01-15 11:39:08 -08:00
Yu, Zijun	38254cf592	NPU prefill chunking	2026-01-15 11:39:08 -08:00
XuejunZhai	992dea73fd	Fix error for naive	2026-01-15 11:39:08 -08:00
XuejunZhai	ae936519d2	Remove the second decoder for node. Moving the function into the model decoder	2026-01-15 11:39:05 -08:00
Arshath	4400b5cb4b	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	98396b275a	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	4a57b37d4d	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	bed495226d	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	11b4cc5a67	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	047bfb5c90	Update ggml-decoder.cpp Hitting error while compiling on windows: error C3861: 'unsetenv': identifier not found Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it. Proposed fix: Use _putenv_s() (Windows equivalent) This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment. This keeps cross-platform compatibility.	2026-01-15 11:38:07 -08:00
Yu, Zijun	531941b348	Fix NPU	2026-01-15 11:28:48 -08:00
Yu, Zijun	ae404f7cbb	Fix llama-bench	2026-01-15 11:28:48 -08:00
Yu, Zijun	072dde0b2b	change graph to 4d, support multi sequences	2026-01-15 11:28:48 -08:00
Yu, Zijun	ea2c99be1c	NPU unify PD (handled internally)	2026-01-15 11:28:48 -08:00
Yu, Zijun	303923aba7	Clean placeholders in ggml-openvino.cpp	2026-01-15 11:27:30 -08:00
Zijun Yu	b8690bc055	NPU Unify PD (#14 ) * Stateless. Fix llama-cli llama-server * Simplify broadcast op in attention * Replace get_output_tensor+memcpy with set_output_tensor * NPU unify PD. Unify dynamic and static dims	2026-01-15 11:27:30 -08:00

1 2 3 4 5 ...

7961 Commits All Branches Search

7961 Commits

All Branches