llama.cpp

Commit Graph

Author	SHA1	Message	Date
Yu, Zijun	651b2c06cb	* Use find_package in CMake to configure OpenVINO * Remove OPENVINO_OP_DEBUG * Simplify set_input_output in decoder * Fix CPY in set_input_output * Use params from converted ov model in setting input	2026-01-15 10:05:41 -08:00
zhanmyz	84be5c6f15	1. Delete some comments 2. Process Prompt and predict first token is OK	2026-01-15 10:05:41 -08:00
zhanmyz	eac9a99530	1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase of “1. Process Prompt and predict the first token”. 2. There is still an AC issue in the "2. Predict the subsequent tokens phase" and it is being debugged. A deviation has been detected in the computation of OpenVINO's CPY Node at stage 2, and it is currently being fixed.	2026-01-15 10:05:41 -08:00
zhanmyz	8ae700ae11	Process Prompt and predict first token is OK	2026-01-15 10:05:41 -08:00
zhanmyz	8020138406	add debug info	2026-01-15 10:05:41 -08:00
zhanmyz	b02265a507	1. In the Prompt process and predict first token stage, the PERMUTE node needs to be integrated into the OV Frontend 2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.	2026-01-15 10:05:41 -08:00
zhanmyz	19ec9b6bf5	Try to add VIEW node to OV Frontend and have some issues that need to be dealt with	2026-01-15 10:05:41 -08:00
zhanmyz	b14b49d5f6	Minor Update	2026-01-15 10:05:41 -08:00
zhanmyz	467a5ddf04	1. Update the implementation of CPY node when it's non-contiguous 2. Remove duplicate get node operation function	2026-01-15 10:05:41 -08:00
zhanmyz	cff473a9e2	1. All operators implemented using OpenVINO can be successfully executed individually. 2. VIEW op output tensor shape is not same with CONT(non-contiguous) input tensor shape 3. CPY(non-contiguous) can't be implemented with original input/output tensor shape and data(need change the original shape when create input/output tensor) Currently. VIEW op executed in the ggml backend and others executed in the OpenVINO Frontend.	2026-01-15 10:05:41 -08:00
zhanmyz	e08a7fda33	All adjacent ops can conversion but calculation result is wrong and need debugging	2026-01-15 10:05:41 -08:00
zhanmyz	d05c458421	change CONT and MULMAT input node shape	2026-01-15 10:05:41 -08:00
zhanmyz	246a2d1021	Change the input and ouput node shape of MUL_MAT operator	2026-01-15 10:05:41 -08:00
zhanmyz	f37fa21a5c	Change the input and ouput node shape of MUL_MAT operator	2026-01-15 10:05:41 -08:00
zhanmyz	f98d215162	Change the input parameter shape of CONT operator	2026-01-15 10:05:41 -08:00
zhanmyz	9a7b7d8d6d	OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX/ADD adjacent op graph conversion	2026-01-15 10:05:41 -08:00
zhanmyz	95ae982d59	OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion of consecutive OPs	2026-01-15 10:05:41 -08:00
zhanmyz	901f7347ff	Execute CONT & VIEW operators in OV Frontend is OK	2026-01-15 10:05:41 -08:00
zhanmyz	081b52667b	Execute singel CONT operator is OK	2026-01-15 10:05:41 -08:00
zhanmyz	afb8594194	add tmp source code files	2026-01-15 10:05:41 -08:00
zhanmyz	57582fda39	add implementation of CPY when the output tensor is non-contiguous	2026-01-15 10:05:41 -08:00
zhanmyz	8484769981	add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops	2026-01-15 10:05:41 -08:00
zhanmyz	cb2729bc4a	Move CPY from GGML OV Backend to OV Frontend	2026-01-15 10:05:41 -08:00
zhanmyz	2b04bd43be	Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML backend	2026-01-15 10:05:41 -08:00
zhanmyz	0f7d07de7d	Add support for RMS_NORM OP	2026-01-15 10:05:41 -08:00
yumengbo	2353c73f53	Support ROPE op.	2026-01-15 10:05:41 -08:00
yumengbo	8aba03bac6	Support Softmax op	2026-01-15 10:05:41 -08:00
yumengbo	d218c61e6d	Support Softmax op	2026-01-15 10:05:41 -08:00
yumengbo	590f587b27	Add support for UNARY SILU op . Fix pytorch impl bugs.	2026-01-15 10:05:41 -08:00
yumengbo	b100f89bad	Change to implementation following pytorch frontend	2026-01-15 10:05:41 -08:00
yumengbo	e95f29cbc0	Fix issue for output memory copy of infer request	2026-01-15 10:05:41 -08:00
zhanmyz	8c5a609f8d	add the rms_norm operator implemented using OpenVINO to the GGML backend of llama.cpp	2026-01-15 10:05:41 -08:00
zhanmyz	80c330a469	Update build.md and add operation mapping(GGML to OpenVINO)	2026-01-15 10:05:41 -08:00
zhanmyz	49804f43fc	add GET_ROWS operator of OpenVINO to GGML of llama.cpp	2026-01-15 10:05:41 -08:00
yumengbo	5b46dc23be	Change output for infer request to set output tensor. Support scale, view op.	2026-01-15 10:05:41 -08:00
yumengbo	31bd816426	Add GGML_OV_FRONTEND option. Add readme.	2026-01-15 10:05:41 -08:00
yumengbo	9b7b63d12c	Convert subgraph with add, sub, mul, div op to ov model and do infer on openvino device	2026-01-15 10:05:41 -08:00
yumengbo	34e826ac14	Implement GgmlOvDecoder. Add dump functions.	2026-01-15 10:05:41 -08:00
yumengbo	171c4681f4	Add PoC of integration of openvino frontend. Main changes: ggml-ov-frontend-utils, GraphIterator, Decoder	2026-01-15 10:05:41 -08:00
zhanmyz	ee31dc1c1b	add get openvino available ops function	2026-01-15 10:05:41 -08:00
zhanmyz	77d68146a8	add OpenVINO frontend convert process steps	2026-01-15 10:05:41 -08:00
zhanmyz	0a81aa19f7	Add compile options	2026-01-15 10:05:40 -08:00
zhanmyz	adc2c70f44	Add OpenVINO MUL operator to GGML of Llama.cpp.	2026-01-15 10:05:40 -08:00
zhanmyz	faa4a7de76	Solve the issue of abnormal model output caused by using OpenVINO ADD operator	2026-01-15 10:05:40 -08:00
zhanmyz	9b9d51dddf	* Configure the device(default CPU) that uses OpenVINO to compile the model * Add OpenVINO ADD operator to Llama.cpp. The output is somewhat abnormal and needs further debugging.	2026-01-15 10:05:40 -08:00
zhanmyz	5294402b50	add openvino as optional backend for Llama.cpp ggml	2026-01-15 10:05:40 -08:00
Yanglei Zou	fe5720e684	Add ggml-openvino base files	2026-01-15 10:05:40 -08:00
Johannes Gäßler	5c662d21a3	CUDA: fix allignment on register spill for FA (#18815 )	2026-01-15 15:14:50 +01:00
shalinib-ibm	8cc0ba957b	ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (#18837 )	2026-01-15 17:31:18 +08:00
Max Krasnyansky	cff777f226	hexagon: support for OP_CPY, host buffers now optional, hvx-utils refactoring and optimizations (#18822 ) * hexagon: disable repack buffers if host buffers are disabled, improved handling of env vars * hexagon: add support for OP_CPY fp16/fp32 -> fp16/fp32 Factore out all hvx_copy functions into hvx-copy.h header and reduced code duplication. Update HTP ops infra to support OP_CPY * hexagon: cleanup and refactor hex/hvx/htp headers and helper libs hex is basically all scalar/core platform stuff (L2, DMA, basic utils) hvx is all hvx related utils, helpers, etc htp is higher level stuff like Ops, etc hvx-utils library got a nice round of cleanup and refactoring to reduce duplication use hvx_vec_store_a where possible * hexagon: refactor HVX sigmoid functions to hvx-sigmoid.h Moved sigmoid and tanh vector functions from hvx-utils.h to a new header hvx-sigmoid.h. Implemented aligned and unaligned variants for sigmoid array processing using a macro pattern similar to hvx-copy.h. Updated act-ops.c to use the new aligned variant hvx_sigmoid_f32_aa. Removed unused hvx-sigmoid.c. * hexagon: factor out hvx-sqrt.h * hexagon: mintor update to hvx-utils.h * hexagon: remove spurios log * hexagon: factor out and optimize hvx_add/sub/mul * hexagon: remove _opt variants of add/sub/mul as they simply fully aligned versions * hexagon: refactor reduction functions to hvx-reduce.h Moved `hvx_self_max_f32` and `hvx_self_sum_f32` from `hvx-utils.h`/`.c` to `hvx-reduce.h`. Renamed them to `hvx_reduce_max_f32` and `hvx_reduce_sum_f32`. Added aligned (`_a`) and unaligned (`_u`) variants and used macros to unify logic. Updated `softmax-ops.c` to use the new functions. * hexagon: refactor the rest of arithmetic functions to hvx-arith.h Moved `hvx_sum_of_squares_f32`, `hvx_min_scalar_f32`, and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` to use `dst, src, ..., n` argument order. Updated call sites in `act-ops.c`. Refactor Hexagon HVX arithmetic functions (min, clamp) to hvx-arith.h Moved `hvx_min_scalar_f32` and `hvx_clamp_scalar_f32` from `hvx-utils.c/h` to `hvx-arith.h`. Implemented aligned/unaligned variants (`_aa`, `_au`, etc.) and used macros to reduce code duplication. Updated these functions to use `dst, src, ..., n` argument order and updated call sites in `act-ops.c`. `hvx_sum_of_squares_f32` remains in `hvx-utils.c` as requested. * hexagon: refactor hvx_sum_of_squares_f32 - Modify `hvx_sum_of_squares_f32` in `ggml/src/ggml-hexagon/htp/hvx-reduce.h` to use `dst, src` signature. - Implement `_a` (aligned) and `_u` (unaligned) variants for `hvx_sum_of_squares_f32`. - Update `hvx_reduce_loop_body` macro to support both returning and storing results via `finalize_op`. - Update existing reduction functions in `hvx-reduce.h` to use the updated macro. - Update `rms_norm_htp_f32` in `ggml/src/ggml-hexagon/htp/unary-ops.c` to match the new signature. * hexagon: use hvx_splat instead of memset * hexagon: consistent use of f32/f16 in all function names to match the rest of GGML * hexagon: fix hvx_copy_f16_f32 on v75 and older * hexagon: update readme to include GGML_HEXAGON_EXPERIMENTAL * scripts: update snapdragon/adb scripts to enable host param	2026-01-14 21:46:12 -08:00

1 2 3 4 5 ...

1944 Commits