Leszek Hanusz
fd3cb9bbdd
Merge branch 'master' into notebook
2026-02-17 01:57:31 +01:00
Leszek Hanusz
2377b8c81e
Merge branch 'master' into notebook
2026-02-16 02:22:25 +01:00
Adrien Gallouët
9e118b97c4
build : remove LLAMA_HTTPLIB option ( #19623 )
...
This option was introduced as a workaround because cpp-httplib could not
build on visionOS. Since it has been fixed and now compiles on all platforms,
we can remove it and simplify many things.
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2026-02-15 15:38:50 +01:00
Aleksander Grygier
baa12f3831
webui: Architecture and UI improvements ( #19596 )
2026-02-14 09:06:41 +01:00
Aleksander Grygier
5174d7206f
webui: UI and routing fixes ( #19586 )
...
* chore: update webui build output
* chore: update webui build output
* fix: Scroll issues in DropdownMenuSearchable
* webui: fix redirect to root ignoring base path
* fix: Word wrapping
* fix: remove obsolete modality UI tests causing CI failures
- Remove VisionModality/AudioModality test stories
- Remove mockServerProps usage and imports
- Simplify Default test (remove dropdown interaction checks)
- Simplify FileAttachments test (remove mocks)
* feat: Improve formatting performance time
---------
Co-authored-by: Pascal <admin@serveurperso.com>
2026-02-13 12:31:00 +01:00
Aleksander Grygier
4c61875bf8
webui: Add switcher to Chat Message UI to show raw LLM output ( #19571 )
2026-02-12 19:55:51 +01:00
Aleksander Grygier
4d688f9ebb
(webui) FEATURE: Enable adding or injecting System Message into chat ( #19556 )
...
* feat: Enable adding System Prompt per-chat
* fix: Save draft message in Chat Form when adding System Prompt from new chat view
* fix: Proper system message deletion logic
* chore: Formatting
* chore: update webui build output
2026-02-12 13:56:08 +01:00
Aleksander Grygier
f486ce9f30
(webui) REFACTOR: UI primitives and polish ( #19551 )
...
* webui: UI primitives and polish (non-MCP)
* chore: update webui build output
2026-02-12 12:21:00 +01:00
Aleksander Grygier
38adc7d469
WebUI Architecture Cleanup ( #19541 )
...
* webui: architecture foundation (non-MCP core refactors)
* chore: update webui build output
2026-02-12 11:22:27 +01:00
RichardScottOZ
fa16e517a3
server : fix typo in README.md for features list ( #19510 )
...
extra l for full
2026-02-12 08:56:25 +01:00
Leszek Hanusz
8a6843aac1
Fix ApiChatCompletionRequest
2026-02-10 03:14:14 +01:00
Leszek Hanusz
8e125febc9
Don't use ChatService.notifyTimings
2026-02-10 01:54:05 +01:00
Leszek Hanusz
a35e4c4d81
Use a separate callbacks argument for sendCompletion
2026-02-10 01:20:14 +01:00
Leszek Hanusz
8f79f1fccb
Removing non-stream /completion implementation + fix api
2026-02-10 00:39:26 +01:00
손희준
820ebfa6f4
Server: log when converting requests to chat completions format ( #19457 )
...
* Log converting requests
* Print as debug instead of info [no ci]
---------
Co-authored-by: openingnow <>
2026-02-09 16:22:57 +01:00
Sascha Rogmann
292f6908cd
spec : remove check rate ( #19377 )
...
* spec: remove parameter spec-ngram-check-rate
* spec : renamed statistics vars
* spec : add n_call_begin, n_call_accept
* spec : don't enable key-map-stats
2026-02-09 15:30:50 +02:00
Georgi Gerganov
eb449cdfa4
server : improve context checkpoint logic ( #19408 )
2026-02-08 09:40:04 +02:00
Georgi Gerganov
dfde5993ea
common : add common_speculative_is_compat() ( #19270 )
...
* llama : add llama_memory_can_rm_suffix()
* Revert "llama : add llama_memory_can_rm_suffix()"
This reverts commit d30e59b62a .
* spec : check if the target context is compatible for spec decoding
2026-02-06 16:47:22 +02:00
Leszek Hanusz
a0c5c26fb9
Fix calculation of total tokens after undo/redo
2026-02-05 02:33:39 +01:00
Leszek Hanusz
4659a36ffd
Add 42px min height to the statistics to avoid flickering height problems + remove unused imports
2026-02-04 18:44:22 +01:00
Leszek Hanusz
77dc99cd9a
Remove [DONE] check
2026-02-04 18:11:27 +01:00
Leszek Hanusz
031e426005
Run npm run format
2026-02-04 16:31:44 +01:00
Leszek Hanusz
393faf0166
Put completion api service in separate file
2026-02-04 16:29:53 +01:00
Leszek Hanusz
251ba9d72a
Put tokenize in a separate file
2026-02-04 15:58:54 +01:00
Leszek Hanusz
efd274ab3d
chore: update webui build output
2026-02-04 14:25:20 +01:00
Leszek Hanusz
ad3b8df38f
Remove currentConfig.model
2026-02-04 02:03:59 +01:00
Leszek Hanusz
f20b17a087
Remove inputContent var and use tokenize only when needed
2026-02-04 01:23:24 +01:00
Leszek Hanusz
9cf4742adb
Fix tokenize with router on
2026-02-04 00:21:56 +01:00
Leszek Hanusz
03077cf297
Merge branch 'master' into notebook
2026-02-03 03:04:31 +01:00
Leszek Hanusz
210dc6a2c0
Running npm run format
2026-02-03 02:27:10 +01:00
Leszek Hanusz
9dc75f2664
Fix npm run check errors
2026-02-03 02:22:32 +01:00
Leszek Hanusz
f42d889a47
Fix vertical alignment of Generate tooltip shortcut info
2026-02-03 02:14:28 +01:00
Leszek Hanusz
fb2095e815
Show total number of tokens by using tokenizer
2026-02-03 01:50:52 +01:00
Leszek Hanusz
3657a8a7ad
Implement shortcuts for the notebook page
2026-02-02 23:59:36 +01:00
Leszek Hanusz
7892b259cb
Add last undo/redo for notebook page
2026-02-02 22:39:07 +01:00
Leszek Hanusz
f041a864ed
Use same dialog for server errors on notebook page
2026-02-02 21:29:48 +01:00
Leszek Hanusz
11e3cd81ce
Protect window from accidental closure if the notebook is not empty as it is not saved
2026-02-02 21:15:24 +01:00
Leszek Hanusz
301c3fec7e
Add generation statistics to notebook page
2026-02-02 18:39:46 +01:00
Matthieu Coudron
a3fa035822
server: print actual model name in 'model not found" error ( #19117 )
...
Experimenting with AI, my environment gets messy fast and it's not
always easy to know what model my software is trying to load. This helps
with troubleshooting.
before:
Error: {
code = 400,
message = "model not found",
type = "invalid_request_error"
}
After:
Error: {
code = 400,
message = "model 'toto' not found",
type = "invalid_request_error"
}
2026-02-02 16:55:27 +01:00
Leszek Hanusz
8a71126e5b
Autoscroll the notebook textarea depending on config parameter
2026-02-02 16:19:53 +01:00
Leszek Hanusz
e80ba11778
Fix sidebar behavior same as chat pages
2026-02-02 15:46:12 +01:00
Leszek Hanusz
ff2f0bba4a
Remove console logs
2026-02-02 15:06:51 +01:00
Christian Kastner
7a4ca3cbd9
docs : Minor cleanups ( #19252 )
...
* Update old URLs to github.com/ggml-org/
* Bump copyrights
2026-02-02 08:38:55 +02:00
Leszek Hanusz
c9f9863268
Add .agent/ to gitignore
...
Fix buttons
Fix model loading with router enabled
remove stats for now
lint
2026-02-01 23:20:34 +01:00
Leszek Hanusz
3af9b34aa2
Refine Notebook UI: improved layout, added stats and model info
2026-01-31 23:59:45 +01:00
Leszek Hanusz
6d96745375
Implement Notebook interface
2026-01-31 22:14:28 +01:00
Georgi Gerganov
bbada8bfb9
server : wrap around the "id_slot" parameter ( #19207 )
...
* server : wrap around the "id_slot" parameter
* cont : minor
2026-01-30 19:46:10 +02:00
Georgi Gerganov
dabaa2e77a
spec : add ngram-mod ( #19164 )
...
* spec : add ngram-mod
* cont : simplify + keep track of occupancy
* cont : cleanup
* cont : move initialization to common/speculative
* cont : cleanup
* cont : cleanup
* cont : fix
2026-01-30 18:21:48 +02:00
Andrew Marshall
84b0a98319
webui: Update Svelte to fix effect_update_depth_exceeded errors ( #19144 )
...
The upstream fix is first available in 5.38.2, so constrain to at least
that version.
Rebuild pre-compiled webui index.html.gz based on these changes.
See also:
https://github.com/ggml-org/llama.cpp/issues/16347
https://github.com/huntabyte/bits-ui/issues/1687
https://github.com/sveltejs/svelte/issues/16548
2026-01-29 15:56:39 +01:00
Sascha Rogmann
72d3b1898a
spec : add self‑speculative decoding (no draft model required) + refactor ( #18471 )
...
* server: introduce self-speculative decoding
* server: moved self-call into speculative.cpp
* can_speculate() includes self-speculation
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* server: can_speculate() tests self-spec
* server: replace can_speculate() with slot.can_speculate()
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* common: use %zu format specifier for size_t in logging
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* server: can_speculate() requires a task instance
* common: ngram map, config self-speculative decoding
* common: add enum common_speculative_type
* common: add vector of speculative states
* common: add option --spec-draftless
* server: cleanup (remove slot.batch_spec, rename)
* common: moved self-spec impl to ngram-map
* common: cleanup (use common_speculative_state_draft)
* spec : refactor
* cont : naming
* spec: remove --spec-config
* doc: (draftless) speculative decoding
* common: print performance in spec decoding
* minor : cleanup
* common : better names
* minor : cleanup + fix build
* minor: comments
* CODEOWNERS: add common/ngram-map.* (#18471 )
* common : rename speculative.draftless_type -> speculative.type
* ngram-map : fix uninitialized values
* ngram-map : take into account the input can become shorter
* ngram-map : revert len check for now
* arg : change `--spec-draftless` -> `--spec-type`
* spec : add common_speculative_state::accept()
* spec : refactor + add common_speculative_begin()
* spec : fix begin() call with mtmd
* spec : additional refactor + remove common_speculative_params
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2026-01-28 19:42:42 +02:00