llama.cpp

Commit Graph

Author	SHA1	Message	Date
Yu, Zijun	65348b5d20	fallback naive run with accuracy issue	2026-01-15 11:39:08 -08:00
Yu, Zijun	59e7e7c47d	NPU fix llama-bench	2026-01-15 11:39:08 -08:00
Yu, Zijun	38254cf592	NPU prefill chunking	2026-01-15 11:39:08 -08:00
XuejunZhai	992dea73fd	Fix error for naive	2026-01-15 11:39:08 -08:00
XuejunZhai	ae936519d2	Remove the second decoder for node. Moving the function into the model decoder	2026-01-15 11:39:05 -08:00
Arshath	4400b5cb4b	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	98396b275a	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	4a57b37d4d	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	bed495226d	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	11b4cc5a67	Update ggml-decoder.cpp	2026-01-15 11:38:13 -08:00
Arshath	047bfb5c90	Update ggml-decoder.cpp Hitting error while compiling on windows: error C3861: 'unsetenv': identifier not found Reason: unsetenv() is a POSIX function; it doesn’t exist on Windows. Visual Studio (MSVC) won’t recognize it. Proposed fix: Use _putenv_s() (Windows equivalent) This is supported by MSVC and achieves the same effect: it removes the environment variable from the process environment. This keeps cross-platform compatibility.	2026-01-15 11:38:07 -08:00
Yu, Zijun	531941b348	Fix NPU	2026-01-15 11:28:48 -08:00
Yu, Zijun	ae404f7cbb	Fix llama-bench	2026-01-15 11:28:48 -08:00
Yu, Zijun	072dde0b2b	change graph to 4d, support multi sequences	2026-01-15 11:28:48 -08:00
Yu, Zijun	ea2c99be1c	NPU unify PD (handled internally)	2026-01-15 11:28:48 -08:00
Yu, Zijun	303923aba7	Clean placeholders in ggml-openvino.cpp	2026-01-15 11:27:30 -08:00
Zijun Yu	b8690bc055	NPU Unify PD (#14 ) * Stateless. Fix llama-cli llama-server * Simplify broadcast op in attention * Replace get_output_tensor+memcpy with set_output_tensor * NPU unify PD. Unify dynamic and static dims	2026-01-15 11:27:30 -08:00
Yu, Zijun	eba8113dc4	Style: middle ptr and ref align, omit optional struct keyword	2026-01-15 11:27:30 -08:00
Yu, Zijun	bd3093f90c	Style: use switch in supports_ops	2026-01-15 11:27:30 -08:00
Ravi Panchumarthy	841d673bd0	Update to OV-2025.3 and CMakeLists.txt	2026-01-15 11:26:00 -08:00
Yu, Zijun	2d2f00a41f	Fix llama-3-8b and phi3-mini q4_0 NPU	2026-01-15 11:26:00 -08:00
Yu, Zijun	299f4923bb	fix after rebasing	2026-01-15 11:26:00 -08:00
Yu, Zijun	8b82d1153b	Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working	2026-01-15 11:26:00 -08:00
Yu, Zijun	a9371ea646	Fix llama-cli (need to run with --no-warmup)	2026-01-15 11:26:00 -08:00
cavusmustafa	05d7abae8c	Fix for Phi3	2026-01-15 11:26:00 -08:00
cavusmustafa	e7252920e1	env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added	2026-01-15 11:26:00 -08:00
cavusmustafa	c112bc4e73	kvcachefusion support	2026-01-15 11:26:00 -08:00
Yu, Zijun	973a80fd02	Always apply Eliminate_ZP to fix GPU compile issue on some platforms	2026-01-15 11:26:00 -08:00
Yu, Zijun	fdadca1e89	Fix after rebasing	2026-01-15 11:26:00 -08:00
Yu, Zijun	f3afa7b914	Requantize Q6_K (gs16) to gs32 on GPU	2026-01-15 11:26:00 -08:00
Yu, Zijun	e4bfe5a20d	Add Q5_K to support phi-3-q4_k_m	2026-01-15 11:26:00 -08:00
Yu, Zijun	2f1d50fb07	Minor refactor	2026-01-15 11:26:00 -08:00
Yu, Zijun	67e178a2f6	Minor: not add attention_size_swa for non-swa model	2026-01-15 11:26:00 -08:00
Yu, Zijun	1a38339cea	Fix ROPE accuracy when freq_scale != 1	2026-01-15 11:26:00 -08:00
Yu, Zijun	602f9ca4af	Fix NPU accuracy	2026-01-15 11:26:00 -08:00
Yu, Zijun	9de874cb7b	Support iSWA	2026-01-15 11:25:58 -08:00
Yu, Zijun	7d81861a18	Fix Hunyuan	2026-01-15 11:20:31 -08:00
Yu, Zijun	597561242f	Add GeGLU	2026-01-15 11:20:31 -08:00
Yu, Zijun	be07073e0e	Apply EliminateZP only for npu	2026-01-15 11:20:31 -08:00
Yu, Zijun	da2cc993bc	WA for npu 1st token acc issue	2026-01-15 11:20:31 -08:00
Yu, Zijun	434059aef7	Fix NPU compile	2026-01-15 11:20:31 -08:00
Yu, Zijun	bcc343af00	Support BF16 model	2026-01-15 11:20:31 -08:00
Yu, Zijun	dc77cbb3f6	STYLE: make get_types_to_requant a function	2026-01-15 11:20:31 -08:00
Yu, Zijun	2ad1147b9b	Improve debug util; Eliminate nop ReshapeReshape	2026-01-15 11:20:31 -08:00
Yu, Zijun	0f7b253cb3	Fix after rebasing	2026-01-15 11:20:31 -08:00
Yu, Zijun	810eb480f5	Simpilfy translation of get_rows	2026-01-15 11:20:31 -08:00
Yu, Zijun	c5231a2448	Set m_is_static=false as default in decoder	2026-01-15 11:20:31 -08:00
Yu, Zijun	6926655f5b	Add custom quant type: q8_1_c, q4_0_128	2026-01-15 11:20:31 -08:00
Yu, Zijun	b593428eb3	Dequantize q4_1 q4_k q6_k for NPU	2026-01-15 11:20:31 -08:00
Yu, Zijun	82c98335d3	NPU perf: eliminate zp	2026-01-15 11:20:31 -08:00
Yu, Zijun	9ca53c7991	Add NPU Q4_0 support	2026-01-15 11:20:31 -08:00
Yu, Zijun	9900245e0b	Fix test-backend-ops: Treat quantized tensors as weights	2026-01-15 11:20:31 -08:00
Yu, Zijun	a1ce428004	Fix Q4_1	2026-01-15 11:19:15 -08:00
Yu, Zijun	dd80b04235	Fix CI; Disable test-backend-ops	2026-01-15 11:19:15 -08:00
Yu, Zijun	6ab76ed10a	Fix accuracy: disable cpu_repack	2026-01-15 11:19:15 -08:00
Yu, Zijun	663a0b8cce	Quant models run with accuracy issue	2026-01-15 11:19:15 -08:00
Yu, Zijun	d4ca760da8	Add quant weight conversion functions from genai gguf reader	2026-01-15 11:19:15 -08:00
Yu, Zijun	3e897df51c	Update supports_buft and supports_op for quantized models	2026-01-15 11:19:15 -08:00
Yu, Zijun	56d596775d	Change openvino device_type to GPU; Enable flash_attn	2026-01-15 11:19:15 -08:00
Yu, Zijun	65e1b1af6d	Fix after rebasing - Layout of cache k and cache v are unified: [seq, n_head, head_size] - Add CPY and FLASH_ATTN_EXT, flash attn is not used yet - Skip test-backend-ops due to flash attn test crash - Add mutex around graph conversion to avoid test-thread-safety fali in the future - Update NPU config - Update GPU config to disable SDPA opt to make phi-3 run	2026-01-15 11:19:15 -08:00
Yu, Zijun	14c8a85c32	Perf: RMS fused to OV internal RMS op	2026-01-15 11:19:15 -08:00
Yu, Zijun	a7b611bc93	Minor updates for raising PR	2026-01-15 11:19:15 -08:00
Yu, Zijun	f4123be967	Fix test-backend-ops	2026-01-15 11:19:15 -08:00
Yu, Zijun	839f8c66a0	Remove CPY	2026-01-15 11:19:15 -08:00
Yu, Zijun	7bda5021f9	Fix NPU	2026-01-15 11:19:15 -08:00
Yu, Zijun	63d000ba40	Support op SET_ROWS	2026-01-15 11:19:15 -08:00
Yu, Zijun	9a91ca6ef9	Optimize tensor conversion, improve TTFT	2026-01-15 11:19:15 -08:00
Yu, Zijun	37ff226bb6	Use CiD for NPU	2026-01-15 11:19:15 -08:00
Yu, Zijun	fc865340d5	Fix test-backend-ops	2026-01-15 10:26:28 -08:00
Yu, Zijun	43489bbfaa	Revert changes in fuse_to_sdpa	2026-01-15 10:26:28 -08:00
Cavus Mustafa	1a19566b23	add mark decomp pass	2026-01-15 10:26:28 -08:00
Cavus Mustafa	93b2d09a2d	mulmat type conversion update	2026-01-15 10:26:28 -08:00
Cavus Mustafa	e2fdc1b988	mulmat input conversion fix	2026-01-15 10:26:28 -08:00
Yu, Zijun	01cdf4a9cc	matmul in fp32	2026-01-15 10:26:28 -08:00
Cavus Mustafa	9cf56d6837	temp. changes for mark decomp	2026-01-15 10:26:28 -08:00
Yu, Zijun	4e7f04a307	Fix llama-perplexity	2026-01-15 10:26:28 -08:00
Yu, Zijun	75eec6265f	Fix llama-bench; Clang-format	2026-01-15 10:26:28 -08:00
Yu, Zijun	6dc4b90635	Fix NPU	2026-01-15 10:26:28 -08:00
Yu, Zijun	44f4cf34b1	Fix Phi3 ROPE; Add test-backend-ops	2026-01-15 10:26:28 -08:00
Yu, Zijun	1ed49bbfaf	Fix llama-cli	2026-01-15 10:26:28 -08:00
Yu, Zijun	d61f83c9b7	Fix CPY due to cgraph change	2026-01-15 10:23:35 -08:00
Yu, Zijun	f3c0519096	Reduce memory: free ov weights node after graph conversion	2026-01-15 10:20:18 -08:00
Yu, Zijun	a80da69448	Pull out sin cos from rope	2026-01-15 10:20:18 -08:00
Yu, Zijun	3533c14cf6	Fix Phi3 SwiGLU and SoftMax	2026-01-15 10:20:18 -08:00
Yu, Zijun	0fa7a5efef	Refactor: remove past_token_len from extra_inputs	2026-01-15 10:20:18 -08:00
Yu, Zijun	acf358d1ce	Pull out indices creation for kv cache update	2026-01-15 10:20:18 -08:00
Yu, Zijun	bf5414c95e	Replace Concat with Broadcast in MulMat for GQA	2026-01-15 10:20:18 -08:00
Yu, Zijun	ebc4fc9f95	Fuse to SDPA	2026-01-15 10:20:18 -08:00
Yu, Zijun	73ee84fffe	Add SwiGLU	2026-01-15 10:20:18 -08:00
Yu, Zijun	4c582ac7a3	Statful transformation for CPU GPU	2026-01-15 10:20:18 -08:00
Yu, Zijun	8afee795ad	Update clang-format	2026-01-15 10:20:18 -08:00
Yu, Zijun	593484ce5f	Refactor: clean, fix warning	2026-01-15 10:20:18 -08:00
Yu, Zijun	42d4240937	Change due to ggml cgraph changes, all device work	2026-01-15 10:20:18 -08:00
Yu, Zijun	e27738a987	Add AMD64 to CMakeLists	2026-01-15 10:20:18 -08:00
Yu, Zijun	592d7f8bbb	Change due to ggml cgraph changes, llama-3.2 CPU work	2026-01-15 10:20:18 -08:00
Yu, Zijun	f7ad77930e	Change due to ggml cgraph changes, not correct yet	2026-01-15 10:20:18 -08:00
Yu, Zijun	d9ca8f5dbe	NPU support version 2: prefill + kvcache	2026-01-15 10:20:18 -08:00
Yu, Zijun	34531abce4	draft NPU support version 2: prefill + kvcache	2026-01-15 10:20:18 -08:00
Yu, Zijun	7fec223334	Add initial NPU support	2026-01-15 10:20:18 -08:00
Yu, Zijun	8ce5cc597a	Add cgraph tensor output name to OV op name	2026-01-15 10:20:18 -08:00

1 2 3 4 5 ...

2117 Commits