llama.cpp

Commit Graph

Author	SHA1	Message	Date
Xuejun Zhai	95c3071906	Removed API GgmlOvDecoder::get_input_names()	2026-01-15 11:39:08 -08:00
Xuejun Zhai	8ff73e5d53	Removed API m_outputs	2026-01-15 11:39:08 -08:00
Xuejun Zhai	ba852f2a60	Removed API GgmlOvDecoder::get_output_op_params(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	6d7a0d6047	Modified API GgmlOvDecoder::get_output_type(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	f516db1db5	remove unused API get_output_shape(const std::string & name)	2026-01-15 11:39:08 -08:00
Xuejun Zhai	497964afbb	remove unused API GgmlOvDecoder::get_output_names()	2026-01-15 11:39:08 -08:00
Xuejun Zhai	0ea8238ad0	remove unused API GgmlOvDecoder::get_output_stride(const std::string & name)	2026-01-15 11:39:08 -08:00
Yu, Zijun	808619e274	NPU support llma-perplexity -b 512 --no-warmup	2026-01-15 11:39:08 -08:00
Yu, Zijun	65348b5d20	fallback naive run with accuracy issue	2026-01-15 11:39:08 -08:00
Yu, Zijun	59e7e7c47d	NPU fix llama-bench	2026-01-15 11:39:08 -08:00
Yu, Zijun	38254cf592	NPU prefill chunking	2026-01-15 11:39:08 -08:00
XuejunZhai	992dea73fd	Fix error for naive	2026-01-15 11:39:08 -08:00
XuejunZhai	ae936519d2	Remove the second decoder for node. Moving the function into the model decoder	2026-01-15 11:39:05 -08:00
Arshath	4400b5cb4b	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	98396b275a	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	4a57b37d4d	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	bed495226d	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	11b4cc5a67	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	047bfb5c90	Update ggml-decoder.cpp Hitting error while compiling on windows: error C3861: 'unsetenv': identifier not found Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it. Proposed fix: Use _putenv_s() (Windows equivalent) This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment. This keeps cross-platform compatibility.	2026-01-15 11:38:07 -08:00
Yu, Zijun	531941b348	Fix NPU	2026-01-15 11:28:48 -08:00
Yu, Zijun	ae404f7cbb	Fix llama-bench	2026-01-15 11:28:48 -08:00
Yu, Zijun	072dde0b2b	change graph to 4d, support multi sequences	2026-01-15 11:28:48 -08:00
Yu, Zijun	ea2c99be1c	NPU unify PD (handled internally)	2026-01-15 11:28:48 -08:00
Zijun Yu	b8690bc055	NPU Unify PD (#14 ) * Stateless. Fix llama-cli llama-server * Simplify broadcast op in attention * Replace get_output_tensor+memcpy with set_output_tensor * NPU unify PD. Unify dynamic and static dims	2026-01-15 11:27:30 -08:00
Yu, Zijun	eba8113dc4	Style: middle ptr and ref align, omit optional struct keyword	2026-01-15 11:27:30 -08:00
Yu, Zijun	8b82d1153b	Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working	2026-01-15 11:26:00 -08:00
cavusmustafa	c112bc4e73	kvcachefusion support	2026-01-15 11:26:00 -08:00
Yu, Zijun	fdadca1e89	Fix after rebasing	2026-01-15 11:26:00 -08:00
Yu, Zijun	e4bfe5a20d	Add Q5_K to support phi-3-q4_k_m	2026-01-15 11:26:00 -08:00
Yu, Zijun	2f1d50fb07	Minor refactor	2026-01-15 11:26:00 -08:00
Yu, Zijun	67e178a2f6	Minor: not add attention_size_swa for non-swa model	2026-01-15 11:26:00 -08:00
Yu, Zijun	9de874cb7b	Support iSWA	2026-01-15 11:25:58 -08:00
Yu, Zijun	7d81861a18	Fix Hunyuan	2026-01-15 11:20:31 -08:00
Yu, Zijun	bcc343af00	Support BF16 model	2026-01-15 11:20:31 -08:00
Yu, Zijun	2ad1147b9b	Improve debug util; Eliminate nop ReshapeReshape	2026-01-15 11:20:31 -08:00
Yu, Zijun	6926655f5b	Add custom quant type: q8_1_c, q4_0_128	2026-01-15 11:20:31 -08:00
Yu, Zijun	b593428eb3	Dequantize q4_1 q4_k q6_k for NPU	2026-01-15 11:20:31 -08:00
Yu, Zijun	9900245e0b	Fix test-backend-ops: Treat quantized tensors as weights	2026-01-15 11:20:31 -08:00
Yu, Zijun	dd80b04235	Fix CI; Disable test-backend-ops	2026-01-15 11:19:15 -08:00
Yu, Zijun	6ab76ed10a	Fix accuracy: disable cpu_repack	2026-01-15 11:19:15 -08:00
Yu, Zijun	663a0b8cce	Quant models run with accuracy issue	2026-01-15 11:19:15 -08:00
Yu, Zijun	d4ca760da8	Add quant weight conversion functions from genai gguf reader	2026-01-15 11:19:15 -08:00
Yu, Zijun	56d596775d	Change openvino device_type to GPU; Enable flash_attn	2026-01-15 11:19:15 -08:00
Yu, Zijun	65e1b1af6d	Fix after rebasing - Layout of cache k and cache v are unified: [seq, n_head, head_size] - Add CPY and FLASH_ATTN_EXT, flash attn is not used yet - Skip test-backend-ops due to flash attn test crash - Add mutex around graph conversion to avoid test-thread-safety fali in the future - Update NPU config - Update GPU config to disable SDPA opt to make phi-3 run	2026-01-15 11:19:15 -08:00
Yu, Zijun	a7b611bc93	Minor updates for raising PR	2026-01-15 11:19:15 -08:00
Yu, Zijun	f4123be967	Fix test-backend-ops	2026-01-15 11:19:15 -08:00
Yu, Zijun	839f8c66a0	Remove CPY	2026-01-15 11:19:15 -08:00
Yu, Zijun	7bda5021f9	Fix NPU	2026-01-15 11:19:15 -08:00
Yu, Zijun	63d000ba40	Support op SET_ROWS	2026-01-15 11:19:15 -08:00
Yu, Zijun	9a91ca6ef9	Optimize tensor conversion, improve TTFT	2026-01-15 11:19:15 -08:00

1 2 3

109 Commits