llama.cpp

Commit Graph

Author	SHA1	Message	Date
Yu, Zijun	dc77cbb3f6	STYLE: make get_types_to_requant a function	2026-01-15 11:20:31 -08:00
Yu, Zijun	2ad1147b9b	Improve debug util; Eliminate nop ReshapeReshape	2026-01-15 11:20:31 -08:00
Yu, Zijun	6926655f5b	Add custom quant type: q8_1_c, q4_0_128	2026-01-15 11:20:31 -08:00
Yu, Zijun	b593428eb3	Dequantize q4_1 q4_k q6_k for NPU	2026-01-15 11:20:31 -08:00
Yu, Zijun	9900245e0b	Fix test-backend-ops: Treat quantized tensors as weights	2026-01-15 11:20:31 -08:00
Yu, Zijun	65e1b1af6d	Fix after rebasing - Layout of cache k and cache v are unified: [seq, n_head, head_size] - Add CPY and FLASH_ATTN_EXT, flash attn is not used yet - Skip test-backend-ops due to flash attn test crash - Add mutex around graph conversion to avoid test-thread-safety fali in the future - Update NPU config - Update GPU config to disable SDPA opt to make phi-3 run	2026-01-15 11:19:15 -08:00
Yu, Zijun	f4123be967	Fix test-backend-ops	2026-01-15 11:19:15 -08:00
Yu, Zijun	7bda5021f9	Fix NPU	2026-01-15 11:19:15 -08:00
Yu, Zijun	37ff226bb6	Use CiD for NPU	2026-01-15 11:19:15 -08:00
Yu, Zijun	fc865340d5	Fix test-backend-ops	2026-01-15 10:26:28 -08:00
Yu, Zijun	4e7f04a307	Fix llama-perplexity	2026-01-15 10:26:28 -08:00
Yu, Zijun	6dc4b90635	Fix NPU	2026-01-15 10:26:28 -08:00
Yu, Zijun	44f4cf34b1	Fix Phi3 ROPE; Add test-backend-ops	2026-01-15 10:26:28 -08:00
Yu, Zijun	f3c0519096	Reduce memory: free ov weights node after graph conversion	2026-01-15 10:20:18 -08:00
Yu, Zijun	ebc4fc9f95	Fuse to SDPA	2026-01-15 10:20:18 -08:00
Yu, Zijun	4c582ac7a3	Statful transformation for CPU GPU	2026-01-15 10:20:18 -08:00
Yu, Zijun	8afee795ad	Update clang-format	2026-01-15 10:20:18 -08:00
Yu, Zijun	593484ce5f	Refactor: clean, fix warning	2026-01-15 10:20:18 -08:00
Yu, Zijun	592d7f8bbb	Change due to ggml cgraph changes, llama-3.2 CPU work	2026-01-15 10:20:18 -08:00
Yu, Zijun	d9ca8f5dbe	NPU support version 2: prefill + kvcache	2026-01-15 10:20:18 -08:00
Yu, Zijun	34531abce4	draft NPU support version 2: prefill + kvcache	2026-01-15 10:20:18 -08:00
Yu, Zijun	7fec223334	Add initial NPU support	2026-01-15 10:20:18 -08:00
Yu, Zijun	8ac5c225aa	FIX: set_max_token_len	2026-01-15 10:20:18 -08:00
Yu, Zijun	0d505b4e56	STYLE and minor REFACTOR	2026-01-15 10:10:00 -08:00
Yu, Zijun	0d009fe61a	FEAT: Add all conversion code from ov side	2026-01-15 10:10:00 -08:00
Viraj Wadhwa	ffabe95e2a	Rebase - Bring up to date and fix build process	2026-01-15 10:09:23 -08:00
Yu, Zijun	a8e5efa44e	PERF: compile once (dynamic graph + cache)	2026-01-15 10:05:41 -08:00
Yu, Zijun	7d5e234254	FEAT: improve debug capability	2026-01-15 10:05:41 -08:00
Yu, Zijun	0a8cc9ab03	BUILD: update build doc, add cmake preset, add CACHE_DIR env var	2026-01-15 10:05:41 -08:00
Yu, Zijun	c04966cda6	REFACTOR: support weigts as constant	2026-01-15 10:05:41 -08:00
Yu, Zijun	91d2a195b5	change op mappings to list in openvino_supports_op	2026-01-15 10:05:41 -08:00
Yu, Zijun	651b2c06cb	* Use find_package in CMake to configure OpenVINO * Remove OPENVINO_OP_DEBUG * Simplify set_input_output in decoder * Fix CPY in set_input_output * Use params from converted ov model in setting input	2026-01-15 10:05:41 -08:00
zhanmyz	84be5c6f15	1. Delete some comments 2. Process Prompt and predict first token is OK	2026-01-15 10:05:41 -08:00
zhanmyz	eac9a99530	1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase of “1. Process Prompt and predict the first token”. 2. There is still an AC issue in the "2. Predict the subsequent tokens phase" and it is being debugged. A deviation has been detected in the computation of OpenVINO's CPY Node at stage 2, and it is currently being fixed.	2026-01-15 10:05:41 -08:00
zhanmyz	8ae700ae11	Process Prompt and predict first token is OK	2026-01-15 10:05:41 -08:00
zhanmyz	8020138406	add debug info	2026-01-15 10:05:41 -08:00
zhanmyz	b02265a507	1. In the Prompt process and predict first token stage, the PERMUTE node needs to be integrated into the OV Frontend 2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.	2026-01-15 10:05:41 -08:00
zhanmyz	467a5ddf04	1. Update the implementation of CPY node when it's non-contiguous 2. Remove duplicate get node operation function	2026-01-15 10:05:41 -08:00
zhanmyz	cff473a9e2	1. All operators implemented using OpenVINO can be successfully executed individually. 2. VIEW op output tensor shape is not same with CONT(non-contiguous) input tensor shape 3. CPY(non-contiguous) can't be implemented with original input/output tensor shape and data(need change the original shape when create input/output tensor) Currently. VIEW op executed in the ggml backend and others executed in the OpenVINO Frontend.	2026-01-15 10:05:41 -08:00
zhanmyz	e08a7fda33	All adjacent ops can conversion but calculation result is wrong and need debugging	2026-01-15 10:05:41 -08:00
zhanmyz	9a7b7d8d6d	OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX/ADD adjacent op graph conversion	2026-01-15 10:05:41 -08:00
zhanmyz	95ae982d59	OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion of consecutive OPs	2026-01-15 10:05:41 -08:00
zhanmyz	901f7347ff	Execute CONT & VIEW operators in OV Frontend is OK	2026-01-15 10:05:41 -08:00
zhanmyz	afb8594194	add tmp source code files	2026-01-15 10:05:41 -08:00
zhanmyz	8484769981	add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops	2026-01-15 10:05:41 -08:00
zhanmyz	2b04bd43be	Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML backend	2026-01-15 10:05:41 -08:00
yumengbo	590f587b27	Add support for UNARY SILU op . Fix pytorch impl bugs.	2026-01-15 10:05:41 -08:00
yumengbo	b100f89bad	Change to implementation following pytorch frontend	2026-01-15 10:05:41 -08:00
yumengbo	e95f29cbc0	Fix issue for output memory copy of infer request	2026-01-15 10:05:41 -08:00
yumengbo	5b46dc23be	Change output for infer request to set output tensor. Support scale, view op.	2026-01-15 10:05:41 -08:00

1 2

51 Commits