* metal: use shared buffers on eGPU
With #15906, I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.
* metal: use shared buffers on eGPU
* metal: use shared buffers on eGPU
* webui: add "delete all conversations" button to import/export tab
- Add 'Delete all conversations' functionality with confirmation dialog
- Add Trash icon and destructive styling for clear visual indication
- Redirects to "?new_chat=true#/" by using conversationsStore.deleteAll()
* chore: update webui build output
* llama: automatically fit args to free memory
llama-fit-params tool
* fix CI
* hints for bug reports, ensure no reallocation
* fix segfault with Vulkan
* add llama-fit-params to CI
* fix CI
* fix CI
* fix CI
* minor adjustments
* fix assignment of 1 dense layer
* fix logger not being reset on model load failure
* remove --n-gpu-layer hint on model load failure
* fix llama-fit-params verbosity
* fix edge case
* fix typo [no ci]
* [model] add glm-asr support
* fix format for ci
* fix convert format for ci
* update glm_asr convert script & use build_ffn for glm_asr clip & use build_stack for padding and review
* check root architecture for convert hf script
* fix conficlt with upstream
* fix convert script for glm asr & format clip-impl
* format
* restore hparams text
* improved conversion
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
Some backend depends on CMAKE_RUNTIME_OUTPUT_DIRECTORY to create temporary file like metal backened.
Missing CMAKE_RUNTIME_OUTPUT_DIRECTORY will cause some cmake error like permission denied (try to copy file to root).
This PR wants to setup a default path for CMAKE_RUNTIME_OUTPUT_DIRECTORY when it does not exist.
When the number of cols is large, split each row across multiple workgroups.
There are three phases that communicate partial results through temp buffers:
(1) compute max partials
(2) take max of partials, compute sum(exp(x-max)) partials
(3) sum partials, compute scaled result