llama.cpp

Commit Graph

Author	SHA1	Message	Date
Yu, Zijun	531941b348	Fix NPU	2026-01-15 11:28:48 -08:00
Yu, Zijun	ae404f7cbb	Fix llama-bench	2026-01-15 11:28:48 -08:00
Yu, Zijun	072dde0b2b	change graph to 4d, support multi sequences	2026-01-15 11:28:48 -08:00
Yu, Zijun	ea2c99be1c	NPU unify PD (handled internally)	2026-01-15 11:28:48 -08:00
Yu, Zijun	303923aba7	Clean placeholders in ggml-openvino.cpp	2026-01-15 11:27:30 -08:00
Zijun Yu	b8690bc055	NPU Unify PD (#14 ) * Stateless. Fix llama-cli llama-server * Simplify broadcast op in attention * Replace get_output_tensor+memcpy with set_output_tensor * NPU unify PD. Unify dynamic and static dims	2026-01-15 11:27:30 -08:00
Yu, Zijun	eba8113dc4	Style: middle ptr and ref align, omit optional struct keyword	2026-01-15 11:27:30 -08:00
Yu, Zijun	bd3093f90c	Style: use switch in supports_ops	2026-01-15 11:27:30 -08:00
Ravi Panchumarthy	3a1129e073	Update OV dockerfile to use OV2025.3 and update build docs	2026-01-15 11:27:30 -08:00
Ravi Panchumarthy	45af912b48	Update CI to run OV dep install before build	2026-01-15 11:27:30 -08:00
Ravi Panchumarthy	38e8a19f50	Apply CISC review and update CI to OV2025.3	2026-01-15 11:27:28 -08:00
Yu, Zijun	4c8406eb70	Add OV CI cache	2026-01-15 11:26:00 -08:00
Ravi Panchumarthy	841d673bd0	Update to OV-2025.3 and CMakeLists.txt	2026-01-15 11:26:00 -08:00
Yu, Zijun	2d2f00a41f	Fix llama-3-8b and phi3-mini q4_0 NPU	2026-01-15 11:26:00 -08:00
Yu, Zijun	299f4923bb	fix after rebasing	2026-01-15 11:26:00 -08:00
Yu, Zijun	8b82d1153b	Fix add_sliced_mask; Revert mulmat, softmax; Remove input attention_size, iSWA model not working	2026-01-15 11:26:00 -08:00
Yu, Zijun	a9371ea646	Fix llama-cli (need to run with --no-warmup)	2026-01-15 11:26:00 -08:00
cavusmustafa	05d7abae8c	Fix for Phi3	2026-01-15 11:26:00 -08:00
cavusmustafa	e7252920e1	env variable GGML_OPENVINO_DISABLE_SDPA_OPTIMIZATION added	2026-01-15 11:26:00 -08:00
cavusmustafa	c112bc4e73	kvcachefusion support	2026-01-15 11:26:00 -08:00
Yu, Zijun	973a80fd02	Always apply Eliminate_ZP to fix GPU compile issue on some platforms	2026-01-15 11:26:00 -08:00
Yu, Zijun	fdadca1e89	Fix after rebasing	2026-01-15 11:26:00 -08:00
Yu, Zijun	f3afa7b914	Requantize Q6_K (gs16) to gs32 on GPU	2026-01-15 11:26:00 -08:00
Yu, Zijun	e4bfe5a20d	Add Q5_K to support phi-3-q4_k_m	2026-01-15 11:26:00 -08:00
Yu, Zijun	2f1d50fb07	Minor refactor	2026-01-15 11:26:00 -08:00
Yu, Zijun	67e178a2f6	Minor: not add attention_size_swa for non-swa model	2026-01-15 11:26:00 -08:00
Yu, Zijun	1a38339cea	Fix ROPE accuracy when freq_scale != 1	2026-01-15 11:26:00 -08:00
Yu, Zijun	602f9ca4af	Fix NPU accuracy	2026-01-15 11:26:00 -08:00
Yu, Zijun	9de874cb7b	Support iSWA	2026-01-15 11:25:58 -08:00
Yu, Zijun	7d81861a18	Fix Hunyuan	2026-01-15 11:20:31 -08:00
Yu, Zijun	597561242f	Add GeGLU	2026-01-15 11:20:31 -08:00
Yu, Zijun	be07073e0e	Apply EliminateZP only for npu	2026-01-15 11:20:31 -08:00
Yu, Zijun	da2cc993bc	WA for npu 1st token acc issue	2026-01-15 11:20:31 -08:00
Yu, Zijun	434059aef7	Fix NPU compile	2026-01-15 11:20:31 -08:00
Yu, Zijun	bcc343af00	Support BF16 model	2026-01-15 11:20:31 -08:00
Yu, Zijun	dc77cbb3f6	STYLE: make get_types_to_requant a function	2026-01-15 11:20:31 -08:00
Yu, Zijun	2ad1147b9b	Improve debug util; Eliminate nop ReshapeReshape	2026-01-15 11:20:31 -08:00
Yu, Zijun	0f7b253cb3	Fix after rebasing	2026-01-15 11:20:31 -08:00
Yu, Zijun	810eb480f5	Simpilfy translation of get_rows	2026-01-15 11:20:31 -08:00
Yu, Zijun	c5231a2448	Set m_is_static=false as default in decoder	2026-01-15 11:20:31 -08:00
Yu, Zijun	6926655f5b	Add custom quant type: q8_1_c, q4_0_128	2026-01-15 11:20:31 -08:00
Yu, Zijun	b593428eb3	Dequantize q4_1 q4_k q6_k for NPU	2026-01-15 11:20:31 -08:00
Yu, Zijun	82c98335d3	NPU perf: eliminate zp	2026-01-15 11:20:31 -08:00
Yu, Zijun	9ca53c7991	Add NPU Q4_0 support	2026-01-15 11:20:31 -08:00
Yu, Zijun	9900245e0b	Fix test-backend-ops: Treat quantized tensors as weights	2026-01-15 11:20:31 -08:00
Yu, Zijun	a1ce428004	Fix Q4_1	2026-01-15 11:19:15 -08:00
Yu, Zijun	dd80b04235	Fix CI; Disable test-backend-ops	2026-01-15 11:19:15 -08:00
Yu, Zijun	6ab76ed10a	Fix accuracy: disable cpu_repack	2026-01-15 11:19:15 -08:00
Yu, Zijun	663a0b8cce	Quant models run with accuracy issue	2026-01-15 11:19:15 -08:00
Yu, Zijun	d4ca760da8	Add quant weight conversion functions from genai gguf reader	2026-01-15 11:19:15 -08:00

1 2 3 4 5 ...

7917 Commits All Branches Search

7917 Commits

All Branches