Commit Graph

374 Commits

Author SHA1 Message Date
Leszek Hanusz fd3cb9bbdd Merge branch 'master' into notebook 2026-02-17 01:57:31 +01:00
Leszek Hanusz 2377b8c81e Merge branch 'master' into notebook 2026-02-16 02:22:25 +01:00
Adrien Gallouët 9e118b97c4
build : remove LLAMA_HTTPLIB option (#19623)
This option was introduced as a workaround because cpp-httplib could not
build on visionOS. Since it has been fixed and now compiles on all platforms,
we can remove it and simplify many things.

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-15 15:38:50 +01:00
Aleksander Grygier baa12f3831
webui: Architecture and UI improvements (#19596) 2026-02-14 09:06:41 +01:00
Aleksander Grygier 5174d7206f
webui: UI and routing fixes (#19586)
* chore: update webui build output

* chore: update webui build output

* fix: Scroll issues in DropdownMenuSearchable

* webui: fix redirect to root ignoring base path

* fix: Word wrapping

* fix: remove obsolete modality UI tests causing CI failures

- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)

* feat: Improve formatting performance time

---------

Co-authored-by: Pascal <admin@serveurperso.com>
2026-02-13 12:31:00 +01:00
Aleksander Grygier 4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output (#19571) 2026-02-12 19:55:51 +01:00
Aleksander Grygier 4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat (#19556)
* feat: Enable adding System Prompt per-chat

* fix: Save draft message in Chat Form when adding System Prompt from new chat view

* fix: Proper system message deletion logic

* chore: Formatting

* chore: update webui build output
2026-02-12 13:56:08 +01:00
Aleksander Grygier f486ce9f30
(webui) REFACTOR: UI primitives and polish (#19551)
* webui: UI primitives and polish (non-MCP)

* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier 38adc7d469
WebUI Architecture Cleanup (#19541)
* webui: architecture foundation (non-MCP core refactors)

* chore: update webui build output
2026-02-12 11:22:27 +01:00
RichardScottOZ fa16e517a3
server : fix typo in README.md for features list (#19510)
extra l for full
2026-02-12 08:56:25 +01:00
Leszek Hanusz 8a6843aac1 Fix ApiChatCompletionRequest 2026-02-10 03:14:14 +01:00
Leszek Hanusz 8e125febc9 Don't use ChatService.notifyTimings 2026-02-10 01:54:05 +01:00
Leszek Hanusz a35e4c4d81 Use a separate callbacks argument for sendCompletion 2026-02-10 01:20:14 +01:00
Leszek Hanusz 8f79f1fccb Removing non-stream /completion implementation + fix api 2026-02-10 00:39:26 +01:00
손희준 820ebfa6f4
Server: log when converting requests to chat completions format (#19457)
* Log converting requests

* Print as debug instead of info [no ci]

---------

Co-authored-by: openingnow <>
2026-02-09 16:22:57 +01:00
Sascha Rogmann 292f6908cd
spec : remove check rate (#19377)
* spec: remove parameter spec-ngram-check-rate

* spec : renamed statistics vars

* spec : add n_call_begin, n_call_accept

* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Georgi Gerganov eb449cdfa4
server : improve context checkpoint logic (#19408) 2026-02-08 09:40:04 +02:00
Georgi Gerganov dfde5993ea
common : add common_speculative_is_compat() (#19270)
* llama : add llama_memory_can_rm_suffix()

* Revert "llama : add llama_memory_can_rm_suffix()"

This reverts commit d30e59b62a.

* spec : check if the target context is compatible for spec decoding
2026-02-06 16:47:22 +02:00
Leszek Hanusz a0c5c26fb9 Fix calculation of total tokens after undo/redo 2026-02-05 02:33:39 +01:00
Leszek Hanusz 4659a36ffd Add 42px min height to the statistics to avoid flickering height problems + remove unused imports 2026-02-04 18:44:22 +01:00
Leszek Hanusz 77dc99cd9a Remove [DONE] check 2026-02-04 18:11:27 +01:00
Leszek Hanusz 031e426005 Run npm run format 2026-02-04 16:31:44 +01:00
Leszek Hanusz 393faf0166 Put completion api service in separate file 2026-02-04 16:29:53 +01:00
Leszek Hanusz 251ba9d72a Put tokenize in a separate file 2026-02-04 15:58:54 +01:00
Leszek Hanusz efd274ab3d chore: update webui build output 2026-02-04 14:25:20 +01:00
Leszek Hanusz ad3b8df38f Remove currentConfig.model 2026-02-04 02:03:59 +01:00
Leszek Hanusz f20b17a087 Remove inputContent var and use tokenize only when needed 2026-02-04 01:23:24 +01:00
Leszek Hanusz 9cf4742adb Fix tokenize with router on 2026-02-04 00:21:56 +01:00
Leszek Hanusz 03077cf297 Merge branch 'master' into notebook 2026-02-03 03:04:31 +01:00
Leszek Hanusz 210dc6a2c0 Running npm run format 2026-02-03 02:27:10 +01:00
Leszek Hanusz 9dc75f2664 Fix npm run check errors 2026-02-03 02:22:32 +01:00
Leszek Hanusz f42d889a47 Fix vertical alignment of Generate tooltip shortcut info 2026-02-03 02:14:28 +01:00
Leszek Hanusz fb2095e815 Show total number of tokens by using tokenizer 2026-02-03 01:50:52 +01:00
Leszek Hanusz 3657a8a7ad Implement shortcuts for the notebook page 2026-02-02 23:59:36 +01:00
Leszek Hanusz 7892b259cb Add last undo/redo for notebook page 2026-02-02 22:39:07 +01:00
Leszek Hanusz f041a864ed Use same dialog for server errors on notebook page 2026-02-02 21:29:48 +01:00
Leszek Hanusz 11e3cd81ce Protect window from accidental closure if the notebook is not empty as it is not saved 2026-02-02 21:15:24 +01:00
Leszek Hanusz 301c3fec7e Add generation statistics to notebook page 2026-02-02 18:39:46 +01:00
Matthieu Coudron a3fa035822
server: print actual model name in 'model not found" error (#19117)
Experimenting with AI, my environment gets messy fast and it's not
always easy to know what model my software is trying to load. This helps
with troubleshooting.

before:

Error: {
  code = 400,
  message = "model not found",
  type = "invalid_request_error"
}

After:

Error: {
  code = 400,
  message = "model 'toto' not found",
  type = "invalid_request_error"
}
2026-02-02 16:55:27 +01:00
Leszek Hanusz 8a71126e5b Autoscroll the notebook textarea depending on config parameter 2026-02-02 16:19:53 +01:00
Leszek Hanusz e80ba11778 Fix sidebar behavior same as chat pages 2026-02-02 15:46:12 +01:00
Leszek Hanusz ff2f0bba4a Remove console logs 2026-02-02 15:06:51 +01:00
Christian Kastner 7a4ca3cbd9
docs : Minor cleanups (#19252)
* Update old URLs to github.com/ggml-org/

* Bump copyrights
2026-02-02 08:38:55 +02:00
Leszek Hanusz c9f9863268 Add .agent/ to gitignore
Fix buttons
Fix model loading with router enabled
remove stats for now
lint
2026-02-01 23:20:34 +01:00
Leszek Hanusz 3af9b34aa2 Refine Notebook UI: improved layout, added stats and model info 2026-01-31 23:59:45 +01:00
Leszek Hanusz 6d96745375 Implement Notebook interface 2026-01-31 22:14:28 +01:00
Georgi Gerganov bbada8bfb9
server : wrap around the "id_slot" parameter (#19207)
* server : wrap around the "id_slot" parameter

* cont : minor
2026-01-30 19:46:10 +02:00
Georgi Gerganov dabaa2e77a
spec : add ngram-mod (#19164)
* spec : add ngram-mod

* cont : simplify + keep track of occupancy

* cont : cleanup

* cont : move initialization to common/speculative

* cont : cleanup

* cont : cleanup

* cont : fix
2026-01-30 18:21:48 +02:00
Andrew Marshall 84b0a98319
webui: Update Svelte to fix effect_update_depth_exceeded errors (#19144)
The upstream fix is first available in 5.38.2, so constrain to at least
that version.

Rebuild pre-compiled webui index.html.gz based on these changes.

See also:
https://github.com/ggml-org/llama.cpp/issues/16347
https://github.com/huntabyte/bits-ui/issues/1687
https://github.com/sveltejs/svelte/issues/16548
2026-01-29 15:56:39 +01:00
Sascha Rogmann 72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor (#18471)
* server: introduce self-speculative decoding

* server: moved self-call into speculative.cpp

* can_speculate() includes self-speculation

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* server: can_speculate() tests self-spec

* server: replace can_speculate() with slot.can_speculate()

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* common: use %zu format specifier for size_t in logging

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

* server: can_speculate() requires a task instance

* common: ngram map, config self-speculative decoding

* common: add enum common_speculative_type

* common: add vector of speculative states

* common: add option --spec-draftless

* server: cleanup (remove slot.batch_spec, rename)

* common: moved self-spec impl to ngram-map

* common: cleanup (use common_speculative_state_draft)

* spec : refactor

* cont : naming

* spec: remove --spec-config

* doc: (draftless) speculative decoding

* common: print performance in spec decoding

* minor : cleanup

* common : better names

* minor : cleanup + fix build

* minor: comments

* CODEOWNERS: add common/ngram-map.* (#18471)

* common : rename speculative.draftless_type -> speculative.type

* ngram-map : fix uninitialized values

* ngram-map : take into account the input can become shorter

* ngram-map : revert len check for now

* arg : change `--spec-draftless` -> `--spec-type`

* spec : add common_speculative_state::accept()

* spec : refactor + add common_speculative_begin()

* spec : fix begin() call with mtmd

* spec : additional refactor + remove common_speculative_params

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00