* Prevent crash if TTFT >300sec, boosted to 90 days
* server : allow configurable HTTP timeouts for child models
* server : pass needed timeouts from params only
---------
Co-authored-by: Greg Slocum <fromgit@wbtek.slocum.net>
* Fix `msg` typo
* Fix thread safety in destroy() to support generation abortion in lifecycle callbacks.
* UI polish: stack new message change from below; fix GGUF margin not in view port
* Bug fixes: rare racing condition when main thread updating view and and default thread updating messages at the same time; user input not disabled during generation.
* Bump dependencies' versions; Deprecated outdated dsl usage.
This commit refactors the original model embedding script to include a
device selection option. Users can now specify the device (cpu, cuda,
mps, auto) via command-line arguments. It also refactors the code to be
more structured.
This was disabled in #9340 due to compiler crash, but seems to build now as confirmed by the latest comments in #11913.
I've also managed to build the image with `docker build -f .devops/rocm.Dockerfile .` (for all three stages, `full`, `server` and `light`).
A quick attempt at trying to build an arm64 image failed. Since none of the other images are build for arm, I only enabled the amd64 one.
The `runs_on` option was added to match the other entries.
This commit updates the include directive in cumsum.cu to use
cub/cub.cuh instead of cub/block/block_scan.cuh.
The motivation of this change is that without it compilation fails
with the following error:
```console
/llama.cpp/ggml/src/ggml-cuda/cumsum.cu(196): error: name followed by "::" must be a class or namespace name
cub::DeviceScan::InclusiveSum(nullptr,
^
/llama.cpp/ggml/src/ggml-cuda/cumsum.cu(207): error: name followed by "::" must be a class or namespace name
cub::DeviceScan::InclusiveSum((void *) tmp_alloc.get(), tmp_size, src, dst, ne, stream);
^
2 errors detected in the compilation of "/llama.cpp/ggml/src/ggml-cuda/cumsum.cu".
gmake[2]: *** [ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/build.make:317: ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cumsum.cu.o] Error 2
```
Commit 83b3b1c271 ("cuda: optimize
cumsum cub path (#18362)") updated the include directive replacing
device_scan.cuh which is causing this issue.
This commit uses cub/cub.cuh umbrella header which is consistent with
other files in the ggml-cuda directory like mean.cu, sum.cu, etc.
* vulkan: Use BK=32 for coopmat2 mul_mat_id
* vulkan: optimize decodeFuncB in coopmat2 mul_mat_id shader
Disable robustness, remove the OOB check in decodeFuncB, and initialize the
row_ids to zero to avoid OOB access.
Don't slice/offset the B matrix to ic * BN, only to adjust the coord back down
to the range [0, BN) in decodeFuncB. Instead just slice with a row offset of
zero and remove the '& (BN - 1)'. This allows the compiler to common some of
the shared memory loads.
* ggml-cuda: fix blackwell native builds
Replace 12x in native architectures by 12xa
* replace for GGML_NATIVE=OFF too
* only replace for native
* remove 120f-virtual for default compilation
---------
Co-authored-by: Aman Gupta <aman>
* CONV_TRANSPOSE_1D kernel_size>255
* remove condition check
* fix the bug of type conversion
* removing trailing whitespaces
* fix: return true in the switch case