Commit Graph

7794 Commits

Author SHA1 Message Date
zhanmyz eac9a99530 1. Solve the AC issue of Permute+VIEW and MULMAL issue in the phase of “1. Process Prompt and predict the first token”.
2. There is still an AC issue in the "2. Predict the subsequent tokens phase" and it is being debugged.
   A deviation has been detected in the computation of OpenVINO's CPY Node at stage 2, and it is currently being fixed.
2026-01-15 10:05:41 -08:00
zhanmyz 8ae700ae11 Process Prompt and predict first token is OK 2026-01-15 10:05:41 -08:00
zhanmyz 8020138406 add debug info 2026-01-15 10:05:41 -08:00
zhanmyz b02265a507 1. In the Prompt process and predict first token stage, the PERMUTE node needs to be integrated into the OV Frontend
2. In the predict latest token stage, the VIEW, CONT, Reshape need to be integrated into the OV Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz 19ec9b6bf5 Try to add VIEW node to OV Frontend and have some issues that need to be dealt with 2026-01-15 10:05:41 -08:00
zhanmyz b14b49d5f6 Minor Update 2026-01-15 10:05:41 -08:00
zhanmyz 467a5ddf04 1. Update the implementation of CPY node when it's non-contiguous
2. Remove duplicate get node operation function
2026-01-15 10:05:41 -08:00
zhanmyz cff473a9e2 1. All operators implemented using OpenVINO can be successfully executed individually.
2. VIEW op output tensor shape is not same with CONT(non-contiguous) input tensor shape
3. CPY(non-contiguous) can't be implemented with original input/output tensor shape and data(need change the original shape when create input/output tensor)

Currently. VIEW op executed in the ggml backend and others executed in the OpenVINO Frontend.
2026-01-15 10:05:41 -08:00
zhanmyz e08a7fda33 All adjacent ops can conversion but calculation result is wrong and need debugging 2026-01-15 10:05:41 -08:00
zhanmyz d05c458421 change CONT and MULMAT input node shape 2026-01-15 10:05:41 -08:00
zhanmyz 246a2d1021 Change the input and ouput node shape of MUL_MAT operator 2026-01-15 10:05:41 -08:00
zhanmyz f37fa21a5c Change the input and ouput node shape of MUL_MAT operator 2026-01-15 10:05:41 -08:00
zhanmyz f98d215162 Change the input parameter shape of CONT operator 2026-01-15 10:05:41 -08:00
zhanmyz 9a7b7d8d6d OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT/ROPE/SCALE/SOFTMAX/ADD adjacent op graph conversion 2026-01-15 10:05:41 -08:00
zhanmyz 95ae982d59 OV Frontend supports GET_ROWS/RMS_NORM/MUL/MUL_MAT graph conversion of consecutive OPs 2026-01-15 10:05:41 -08:00
zhanmyz 901f7347ff Execute CONT & VIEW operators in OV Frontend is OK 2026-01-15 10:05:41 -08:00
zhanmyz 081b52667b Execute singel CONT operator is OK 2026-01-15 10:05:41 -08:00
zhanmyz afb8594194 add tmp source code files 2026-01-15 10:05:41 -08:00
zhanmyz 57582fda39 add implementation of CPY when the output tensor is non-contiguous 2026-01-15 10:05:41 -08:00
zhanmyz 8484769981 add implementation of MUL_MAT, CPY, CONT of GGML ops using OV ops 2026-01-15 10:05:41 -08:00
zhanmyz cb2729bc4a Move CPY from GGML OV Backend to OV Frontend 2026-01-15 10:05:41 -08:00
zhanmyz 2b04bd43be Add MUL_MAT,CPY,CONT as operators implemented in OpenVINO for GGML backend 2026-01-15 10:05:41 -08:00
zhanmyz 0f7d07de7d Add support for RMS_NORM OP 2026-01-15 10:05:41 -08:00
yumengbo 2353c73f53 Support ROPE op. 2026-01-15 10:05:41 -08:00
yumengbo 8aba03bac6 Support Softmax op 2026-01-15 10:05:41 -08:00
yumengbo d218c61e6d Support Softmax op 2026-01-15 10:05:41 -08:00
yumengbo 590f587b27 Add support for UNARY SILU op . Fix pytorch impl bugs. 2026-01-15 10:05:41 -08:00
yumengbo b100f89bad Change to implementation following pytorch frontend 2026-01-15 10:05:41 -08:00
yumengbo e95f29cbc0 Fix issue for output memory copy of infer request 2026-01-15 10:05:41 -08:00
zhanmyz 8c5a609f8d add the rms_norm operator implemented using OpenVINO to the GGML backend of llama.cpp 2026-01-15 10:05:41 -08:00
zhanmyz 80c330a469 Update build.md and add operation mapping(GGML to OpenVINO) 2026-01-15 10:05:41 -08:00
zhanmyz 49804f43fc add GET_ROWS operator of OpenVINO to GGML of llama.cpp 2026-01-15 10:05:41 -08:00
yumengbo 5b46dc23be Change output for infer request to set output tensor. Support scale, view op. 2026-01-15 10:05:41 -08:00
yumengbo 31bd816426 Add GGML_OV_FRONTEND option. Add readme. 2026-01-15 10:05:41 -08:00
yumengbo 9b7b63d12c Convert subgraph with add, sub, mul, div op to ov model and do infer on openvino device 2026-01-15 10:05:41 -08:00
yumengbo 34e826ac14 Implement GgmlOvDecoder. Add dump functions. 2026-01-15 10:05:41 -08:00
yumengbo 171c4681f4 Add PoC of integration of openvino frontend. Main changes: ggml-ov-frontend-utils, GraphIterator, Decoder 2026-01-15 10:05:41 -08:00
zhanmyz ee31dc1c1b add get openvino available ops function 2026-01-15 10:05:41 -08:00
zhanmyz 77d68146a8 add OpenVINO frontend convert process steps 2026-01-15 10:05:41 -08:00
zhanmyz 0a81aa19f7 Add compile options 2026-01-15 10:05:40 -08:00
zhanmyz adc2c70f44 Add OpenVINO MUL operator to GGML of Llama.cpp. 2026-01-15 10:05:40 -08:00
zhanmyz faa4a7de76 Solve the issue of abnormal model output caused by using OpenVINO ADD operator 2026-01-15 10:05:40 -08:00
zhanmyz 9b9d51dddf * Configure the device(default CPU) that uses OpenVINO to compile the model
* Add OpenVINO ADD operator to Llama.cpp. The output is somewhat abnormal and needs further debugging.
2026-01-15 10:05:40 -08:00
zhanmyz 5294402b50 add openvino as optional backend for Llama.cpp ggml 2026-01-15 10:05:40 -08:00
Yanglei Zou fe5720e684 Add ggml-openvino base files 2026-01-15 10:05:40 -08:00
Georgi Gerganov be8e3d9515
context : do not reserve scheduler for warmups (#18867) 2026-01-15 19:35:57 +02:00
ddh0 13f1e4a9ca
llama : add adaptive-p sampler (#17927)
* initial commit for branch

* simplify constants

* add params to `struct common_params_sampling`, add reference to PR

* explicitly clamp `min_target` and `max_target` to `[0.0, 1.0]`

* add args, rename `queue_size` -> `window_size`

* improved comments

* minor

* remove old unused code from algorithm

* minor

* add power law case to `common_sampler_init`, add sampler name mappings

* clarify behaviour when `window_size = 0`

* add missing enums

* remove `target_range` param, make `target == 1` no-op, cleanup code

* oops, straggler

* add missing parameters in `server-task.cpp`

* copy from author

ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

* remove old debug log, style nit

* fix compiler warning, add commented-out logging per token

* re-write + change parameters + simplify

* oops forgot args.cpp

* fix leftover `window_size`

* add missing values to `common_params_sampling::print()`

* with logging

* does this fix it?

* no, but does this?

* update default decay

* optimize

* fix bad merge

my git skills are lacking

* silence `missing initializer for member`

* update default decay to 0.9

* fix logging

* format (double)

* add power law to the new `samplers` vector

* log sampler init values

* improve logging messages in llama_sampler_power_law

* remove extraneous logging

* simplify target computation

last commit with debug logging!

* remove debug logging, explicitly clamp params at init

* add `use_power_law` flag + logic, minor cleanup

* update `power-law` -> `adaptive-p`

* fix cold start EMA

- `ctx->weighted_sum` is now initialized and reset to `target / (1.0f -
clamped_decay)`
- `ctx->total_weight` is now initialized and reset to `1.0f / (1.0f -
clamped_decay)`

this fixes a "cold start" problem with the moving average

* update `SHARPNESS` constant to `10.0f`

* minor style fixes

no functional changes

* minor style fixes cont.

* update `llama_sampler_adaptive_p_i` for backend sampling (ref: #17004)

* separate into `apply` + `accept` functions

* `pending_token_idx`: switch from `llama_token` to `int32`

functionally identical (`llama.h` has `typedef int32_t llama_token;`),
but its more correct now

* don't transform logits <= -1e9f

* fix masking in backend top-p, min-p

* address review comments

* typo in comments `RND` -> `RNG`

* add docs

* add recommended values in completion docs

* address PR feedback

* remove trailing whitespace (for CI `editorconfig`)

* add to adaptive-p to `common_sampler_types_from_chars`
2026-01-15 19:16:29 +02:00
Xuan-Son Nguyen a04c2b06a3
server: improve slots scheduling for n_cmpl (#18789)
* server : make sure children tasks are scheduled to launch with parent

* fix

* add comment pointing to this PR

* fix

* clean up

* more debug messages

* add pop_deferred_task with specific ID version

* improve the logic

* simple approach

* no double move

* correct return type of launch_slots_with_parent_task
2026-01-15 17:10:28 +01:00
Georgi Gerganov 39173bcacb
context : reserve new scheduler when graph topology changes (#18547)
* context : reserve new scheduler when graph topology changes

* cont : fix

* cont : fix reserve

* cont : reserve only when changes occur + timing

* context : add comments

* llama : reserve on sampler changes

* common : allow null common_sampler

* server : task declares needs (embd, logits, sampling)

* server : do not init sampler if not needed

* llama : fix need_reserve when unsetting a sampler

* server : consolidate slot reset/clear logic
2026-01-15 16:39:17 +02:00
Johannes Gäßler 5c662d21a3
CUDA: fix allignment on register spill for FA (#18815) 2026-01-15 15:14:50 +01:00