llama.cpp

Author	SHA1	Message	Date
Yu, Zijun	072dde0b2b	change graph to 4d, support multi sequences	2026-01-15 11:28:48 -08:00
Zijun Yu	b8690bc055	NPU Unify PD (#14 ) * Stateless. Fix llama-cli llama-server * Simplify broadcast op in attention * Replace get_output_tensor+memcpy with set_output_tensor * NPU unify PD. Unify dynamic and static dims	2026-01-15 11:27:30 -08:00
Yu, Zijun	eba8113dc4	Style: middle ptr and ref align, omit optional struct keyword	2026-01-15 11:27:30 -08:00
cavusmustafa	c112bc4e73	kvcachefusion support	2026-01-15 11:26:00 -08:00
Yu, Zijun	65e1b1af6d	Fix after rebasing - Layout of cache k and cache v are unified: [seq, n_head, head_size] - Add CPY and FLASH_ATTN_EXT, flash attn is not used yet - Skip test-backend-ops due to flash attn test crash - Add mutex around graph conversion to avoid test-thread-safety fali in the future - Update NPU config - Update GPU config to disable SDPA opt to make phi-3 run	2026-01-15 11:19:15 -08:00
Yu, Zijun	44f4cf34b1	Fix Phi3 ROPE; Add test-backend-ops	2026-01-15 10:26:28 -08:00
Yu, Zijun	a80da69448	Pull out sin cos from rope	2026-01-15 10:20:18 -08:00
Yu, Zijun	593484ce5f	Refactor: clean, fix warning	2026-01-15 10:20:18 -08:00
Yu, Zijun	7fec223334	Add initial NPU support	2026-01-15 10:20:18 -08:00
Yu, Zijun	8ce5cc597a	Add cgraph tensor output name to OV op name	2026-01-15 10:20:18 -08:00
Yu, Zijun	0d505b4e56	STYLE and minor REFACTOR	2026-01-15 10:10:00 -08:00
Yu, Zijun	0d009fe61a	FEAT: Add all conversion code from ov side	2026-01-15 10:10:00 -08:00