llama.cpp/tools/server/webui
Daniel Bevenius 9e5e09d087
sampling : remove backend-dist option (wip)
This commit removes the `--backend-dist` option and instead uses the
configured --samplers chain to determine which samplers run on the
backend.

Backend sampling is still enabled using With `--backend_sampling`, and
the sampler chain, either explictly specified using `--samplers` or the
default, is automatically analyzed to determine which samplers can run
on the backend. The system finds the longest contiguous chain of
backend supported samplers from the start of the sampler sequence.
For example:

* If the chain is `top-k -> temperature -> top-p`, and both `top-k` and
  `temperature` are backend-supported but `top-p` is not, then `top-k`
  and `temperature` will run on the backend, while `top-p` and
  subsequent samplers run on the CPU.

* If all configured samplers are supported, the final distribution
  sampling will also happen on the backend, transferring only the
  sampled token IDs back to the host.

* If the sampler chain starts with an unsupported sampler (e.g.,
  `penalties`), all sampling runs on the CPU. Note that this is
  currently the case with the default sampler so to use backend sampling
  it is required to specify a sampler chain. See below for an example.

The following shows how llama-cli can be run with backend sampling:
```console
$ llama-cli -m models/Qwen2.5-VL-3B-Instruct-Q8_0.gguf \
    --prompt 'What is the capital of Sweden?' \
    -n 20 \
    -no-cnv \
    --verbose-prompt \
    -ngl 40 \
    --backend-sampling \
    --samplers 'top_k;temperature'
```
In this case the all sampling will happen on the backend since both
`top_k` and `temperature` are supported backend samplers.

To enable a partial backend sampling (hybrid sampling), for example
running `top_k` and `temperature` on the backend and `typ_p` on the CPU
the following sampler chain could be specified:
```console
$ llama-cli -m models/Qwen2.5-VL-3B-Instruct-Q8_0.gguf \
    --prompt 'What is the capital of Sweden?' \
    -n 20 \
    -no-cnv \
    --verbose-prompt \
    -ngl 40 \
    --backend-sampling \
    --samplers 'top_k;temperature;top_p'
```

If this looks good then I'll follow up with updates the llama-cli and
llama-server documentation to reflect these changes.
2025-11-25 14:01:23 +01:00
..
.storybook Update packages + upgrade Storybook to v10 (#17201) 2025-11-12 19:01:48 +01:00
e2e SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
scripts webui: Remove running `llama-server` within WebUI `dev.sh` script (#16363) 2025-10-01 08:40:26 +03:00
src sampling : remove backend-dist option (wip) 2025-11-25 14:01:23 +01:00
static SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
.gitignore webui: Add a "Continue" Action for Assistant Message (#16971) 2025-11-19 14:39:50 +01:00
.npmrc SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
.prettierignore SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
.prettierrc SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
README.md SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
components.json SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
eslint.config.js SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
package-lock.json webui: Add a "Continue" Action for Assistant Message (#16971) 2025-11-19 14:39:50 +01:00
package.json Update packages + upgrade Storybook to v10 (#17201) 2025-11-12 19:01:48 +01:00
playwright.config.ts Enable per-conversation loading states to allow having parallel conversations (#16327) 2025-10-20 12:41:13 +02:00
svelte.config.js feat(webui): improve LaTeX rendering with currency detection (#16508) 2025-11-03 00:41:08 +01:00
tsconfig.json SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00
vite.config.ts feat(webui): improve LaTeX rendering with currency detection (#16508) 2025-11-03 00:41:08 +01:00
vitest-setup-client.ts SvelteKit-based WebUI (#14839) 2025-09-17 19:29:13 +02:00

README.md

llama.cpp Web UI

A modern, feature-rich web interface for llama.cpp built with SvelteKit. This UI provides an intuitive chat interface with advanced file handling, conversation management, and comprehensive model interaction capabilities.

Features

  • Modern Chat Interface - Clean, responsive design with dark/light mode
  • File Attachments - Support for images, text files, PDFs, and audio with rich previews and drag-and-drop support
  • Conversation Management - Create, edit, branch, and search conversations
  • Advanced Markdown - Code highlighting, math formulas (KaTeX), and content blocks
  • Reasoning Content - Support for models with thinking blocks
  • Keyboard Shortcuts - Keyboard navigation (Shift+Ctrl/Cmd+O for new chat, Shift+Ctrl/Cmdt+E for edit conversation, Shift+Ctrl/Cmdt+D for delete conversation, Ctrl/Cmd+K for search, Ctrl/Cmd+V for paste, Ctrl/Cmd+B for opening/collapsing sidebar)
  • Request Tracking - Monitor processing with slots endpoint integration
  • UI Testing - Storybook component library with automated tests

Development

Install dependencies:

npm install

Start the development server + Storybook:

npm run dev

This will start both the SvelteKit dev server and Storybook on port 6006.

Building

Create a production build:

npm run build

The build outputs static files to ../public directory for deployment with llama.cpp server.

Testing

Run the test suite:

# E2E tests
npm run test:e2e

# Unit tests
npm run test:unit

# UI tests
npm run test:ui

# All tests
npm run test

Architecture

  • Framework: SvelteKit with Svelte 5 runes
  • Components: ShadCN UI + bits-ui design system
  • Database: IndexedDB with Dexie for local storage
  • Build: Static adapter for deployment with llama.cpp server
  • Testing: Playwright (E2E) + Vitest (unit) + Storybook (components)