llama.cpp

Commit Graph

Author	SHA1	Message	Date
Xuan Son Nguyen	f2dbe9c087	rm unused fn	2025-12-01 14:20:43 +01:00
Georgi Gerganov	d182544c99	server : minor	2025-12-01 13:35:40 +02:00
Xuan Son Nguyen	4a1c05c383	fix invalid ptr to shutdown_handler	2025-11-30 15:31:05 +01:00
Xuan Son Nguyen	7b28b5e16a	fix duplicated arg	2025-11-30 14:53:47 +01:00
Xuan Son Nguyen	802e77eaf4	Merge remote-tracking branch 'webui/allozaur/server_model_management_v1_2' into xsn/server_model_management_v1_2	2025-11-29 23:54:34 +01:00
Xuan Son Nguyen	23cb411317	also route anthropic endpoints	2025-11-29 23:29:06 +01:00
Aleksander Grygier	e8b9d74b3b	chore: update webui build output	2025-11-29 23:18:45 +01:00
Aleksander Grygier	acd3c58152	refactor: Remove redundant method	2025-11-29 23:18:24 +01:00
Aleksander Grygier	360a5ed62b	test: Move demo test to tests/server	2025-11-29 23:17:34 +01:00
Xuan Son Nguyen	a82dbbfb30	decouple server_models from server_routes	2025-11-29 23:00:35 +01:00
Xuan Son Nguyen	c1dfccd078	Merge branch 'master' into xsn/server_model_management_v1_2	2025-11-29 22:34:16 +01:00
Xuan-Son Nguyen	ab49f094d2	server: move server-context to its own cpp\|h (#17595 ) * git mv * add server-context.h * add server-context.h * clean up headers * cont : cleanup * also expose server_response_reader (to be used by CLI) * fix windows build * decouple server_routes and server_http --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-29 22:04:44 +01:00
Aleksander Grygier	6fd720e742	Merge remote-tracking branch 'origin/allozaur/server_model_management_v1_2' into allozaur/server_model_management_v1_2	2025-11-29 21:59:33 +01:00
Aleksander Grygier	ae8a1e8137	refactor: Tests to separate location	2025-11-29 21:44:57 +01:00
Aleksander Grygier	949b5fd63e	refactor: Tooltip Provider from core layout	2025-11-29 21:41:36 +01:00
Aleksander Grygier	4f39da823f	test: Update Chat Form UI tests	2025-11-29 20:13:11 +01:00
Aleksander Grygier	33b9cc40a1	Merge branch 'master' into allozaur/server_model_management_v1_2	2025-11-29 19:40:46 +01:00
Haiyue Wang	8c32d9d96d	server: explicitly set the function name in lambda (#17538 ) As [1] explained, the real debug message will be like: "res operator(): operator() : queue result stop" Set the name explicitly, the message is easy for debugging: "res operator(): recv : queue result stop" The left "operator()" is generated by 'RES_DBG() ... __func__' [1]: https://clang.llvm.org/extra/clang-tidy/checks/bugprone/lambda-function-name.html Signed-off-by: Haiyue Wang <haiyuewa@163.com>	2025-11-29 18:43:29 +01:00
Igor Smirnov	0874693b44	common : fix json schema with '\' in literals (#17307 ) * Fix json schema with '\' in literals * Add "literal string with escapes" test	2025-11-29 17:06:32 +01:00
Neo Zhang	7d2add51d8	sycl : support to malloc memory on device more than 4GB, update the doc and script (#17566 ) Co-authored-by: Neo Zhang Jianyu <jianyu.zhang@intel.com>	2025-11-29 14:59:44 +02:00
ixgbe	f698a79c63	ggml: replace hwcap with riscv_hwprobe for RVV detection (#17567 ) Signed-off-by: Wang Yang <yangwang@iscas.ac.cn>	2025-11-29 14:56:31 +02:00
Ruben Ortlam	47a268ea50	Vulkan: MMVQ Integer Dot K-Quant and MUL_MAT_ID support (#16900 ) * vulkan: split mul_mmq_funcs for mul_mat_vecq use * add mxfp4 mmvq * add q2_k mmvq * add q3_k mmvq * add q4_k and q5_k mmvq * add q6_k mmvq * handle 4x4 quants per mmvq thread * enable MUL_MAT_ID mmvq support * enable subgroup optimizations for mul_mat_vec_id shaders * device tuning * request prealloc_y sync after quantization * fix indentation * fix llvmpipe test failures * fix mul_mat_id mmvq condition * fix unused variable warning	2025-11-29 09:37:22 +01:00
Jeff Bolz	59d8d4e963	vulkan: improve topk perf for large k, fix overflow in unit tests (#17582 )	2025-11-29 08:39:57 +01:00
Aleksander Grygier	a568e74c20	chore: update webui build output	2025-11-29 02:40:09 +01:00
Aleksander Grygier	2d556bb93c	test: Fix Storybook mocks	2025-11-29 02:36:41 +01:00
Aleksander Grygier	493ef08723	refactor: Utils imports + move types to `app.d.ts`	2025-11-29 02:33:37 +01:00
Aleksander Grygier	ce9c9afe0d	chore: update webui build output	2025-11-29 01:40:00 +01:00
Aleksander Grygier	2464e06028	feat: Improve UI sidebar background color	2025-11-29 01:39:40 +01:00
Aleksander Grygier	27b152267f	refactor: Constants	2025-11-29 01:38:02 +01:00
Aleksander Grygier	648d2deebc	feat: Attachment logic & UI improvements	2025-11-29 01:36:05 +01:00
Aleksander Grygier	d49d97c642	refactor: Cleanup	2025-11-29 00:51:18 +01:00
Aleksander Grygier	f50ce7b5b4	refactor: Cleanup	2025-11-29 00:50:16 +01:00
Aleksander Grygier	4d16459b4c	re	2025-11-29 00:49:46 +01:00
Aleksander Grygier	c76de5e0ad	refactor: Cleanup	2025-11-29 00:49:20 +01:00
Aleksander Grygier	2f97dbfa65	docs: Add info comment	2025-11-29 00:49:03 +01:00
Aleksei Nikiforov	d82b7a7c1d	gguf-py : fix passing non-native endian tensors (editor-gui and new-metadata) (#17553 ) gguf_new_metadata.py reads data from reader. Reader doesn't byteswap tensors to native endianness. But writer does expect tensors in native endianness to convert them into requested endianness. There are two ways to fix this: update reader and do conversion to native endianness and back, or skip converting endianness in writer in this particular USE-case. gguf_editor_gui.py doesn't allow editing or viewing tensor data. Let's go with skipping excessive byteswapping. If eventually capability to view or edit tensor data is added, tensor data should be instead byteswapped when reading it.	2025-11-28 20:53:01 +01:00
Aleksander Grygier	1adf173dd6	refactor: Cleanup	2025-11-28 19:36:03 +01:00
Aleksander Grygier	dd30810d0a	fix: Modality detection improvement for text-based PDF attachments	2025-11-28 19:30:32 +01:00
DAN™	03914c7ef8	common : move all common_chat_parse_* to chat-parser.cpp. (#17481 )	2025-11-28 19:29:36 +01:00
o7si	3ce7a65c2f	server: fix: /metrics endpoint returning JSON-escaped Prometheus format (#17386 ) * fix: /metrics endpoint returning JSON-escaped Prometheus format * mod: remove string overload from ok() method	2025-11-28 19:14:00 +01:00
Diego Devesa	e072b2052e	ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched (#17276 ) * ggml : add GGML_SCHED_NO_REALLOC option to disable reallocations in ggml_backend_sched Enabled in ggml-ci for testing. * llama : update worst-case graph for unified cache * ci : disable op offload in some tests * fix spelling --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-28 17:33:23 +02:00
Aleksander Grygier	171a0926a1	chore: update webui build output	2025-11-28 16:00:44 +01:00
Aleksander Grygier	68b653ef45	refactor: DRY `getAttachmentDisplayItems` function + fix UI	2025-11-28 15:58:52 +01:00
Aleksander Grygier	1cf5daa8c0	refactor: Cleanup	2025-11-28 15:56:41 +01:00
Aleksander Grygier	04ef4a06e2	chore: update webui build output	2025-11-28 15:44:43 +01:00
Aleksander Grygier	5fadd0fe18	refactor: Components naming	2025-11-28 15:39:47 +01:00
Aleksander Grygier	3470b12b76	chore: update webui build output	2025-11-28 15:09:55 +01:00
Aleksander Grygier	eed1bd9b97	refactor: Enhance model info and attachment handling	2025-11-28 15:08:41 +01:00
R0CKSTAR	c6f7a423c8	[MUSA] enable fp16/fast_fp16/bf16_mma on PH1 (#17551 ) * [MUSA] enable fp16/fast_fp16/bf16_mma on PH1 Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> * Update ggml/src/ggml-cuda/fattn-vec.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/fattn-vec.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Update ggml/src/ggml-cuda/fattn-tile.cuh Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Address review comments Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2025-11-28 14:08:29 +01:00
Aman Gupta	2e7ef98f18	ggml-cuda: add stricter checking for fusion (#17568 ) * ggml-cuda: make conditions for fusion more explicit * ggml-cuda: remove size check as std::equal already does it	2025-11-28 20:34:51 +08:00

1 2 3 4 5 ...

7360 Commits All Branches Search

7360 Commits

All Branches