llama.cpp

Commit Graph

Author	SHA1	Message	Date
Imad Saddik	642a4d68b8	chore: update webui build output	2025-12-20 15:44:02 +01:00
Imad Saddik	9b87eaf898	Updated the disabled text	2025-12-20 15:42:21 +01:00
Imad Saddik	fac6ef71a8	chore: update webui build output	2025-12-20 15:27:35 +01:00
Imad Saddik	37be01ac4c	Applied formatting	2025-12-20 15:26:23 +01:00
Imad Saddik	689b7a5bd6	Removed console log and refactored the code	2025-12-20 15:25:41 +01:00
Imad Saddik	6db71973ab	Applied formatting	2025-12-20 15:18:41 +01:00
Imad Saddik	b519235e0a	Added the ability to set custom pixel values in the combobox	2025-12-20 15:17:52 +01:00
Imad Saddik	3be006dcaf	Display the custom chat width combobox in the settings with just presets	2025-12-20 14:29:15 +01:00
Imad Saddik	14929a77b0	Added the command and popover components	2025-12-20 11:10:06 +01:00
Imad Saddik	416bb35130	Used the new getChatWidth function in ChatProcessingInfo	2025-12-20 09:28:55 +01:00
Imad Saddik	62614a5faa	Used the new getChatWidth function in all chat messages components that need it	2025-12-20 09:27:49 +01:00
Imad Saddik	dccfcc02eb	Used the new getChatWidth function in ChatForm	2025-12-20 09:09:45 +01:00
Imad Saddik	33d8d0f461	Performed formatting	2025-12-20 09:07:54 +01:00
Imad Saddik	61d99bbd88	Used the new chat width logic in ChatScreen and ChatWarning	2025-12-20 09:07:04 +01:00
Imad Saddik	b6dbbcc1fb	Moved and renamed the width-classes.ts file	2025-12-20 09:04:13 +01:00
Imad Saddik	d784cf9bea	Renamed the settings keys and added a new field in the settings	2025-12-20 08:51:27 +01:00
Imad Saddik	fe680a932b	Added support for custom width presets and renamed the constants	2025-12-20 08:49:46 +01:00
Imad Saddik	1cccfaea0f	Added new records to SETTING_CONFIG_DEFAULT	2025-12-20 08:25:01 +01:00
Imad Saddik	67c7ed70b2	chore: update webui build output	2025-12-15 22:03:38 +01:00
Imad Saddik	c638bfeb21	Updated the MAX_WIDTH_CLASSES constant	2025-12-15 22:02:42 +01:00
Imad Saddik	20f5f32d4d	Added custom breakpoints	2025-12-15 22:01:15 +01:00
Imad Saddik	cb2926b904	Fixed indentation	2025-12-14 22:01:12 +01:00
Imad Saddik	ce0a7e3acc	Added new width constants to implement the responsive chat width feature	2025-12-14 21:59:19 +01:00
Imad Saddik	dc0ba44991	Fixed the indentation	2025-12-14 21:11:04 +01:00
Imad Saddik	232eba99b4	Added a new setting to control the width of elements based on the screen size	2025-12-14 20:53:08 +01:00
hipudding	2376b7758c	CANN: Use smart pointers to manage ACL objects (#17238 ) * CANN: Use smart pointers to manage ACL objects Previously, ACL objects were managed via manual destruction, which led to multiple memory-leak issues during runtime. This patch replaces manual memory management with smart pointers so that ACL objects are properly released and ownership is clearly defined. Note that the ownership of an ACL object belongs to the function that creates it. Other internal functions should operate on these ACL objects using raw pointers to avoid unintended ownership transfers. Additionally, since aclTensorList automatically frees its contained aclTensor objects, any aclTensor added to a tensor list must release ownership to avoid double free operations. This PR also removes the asynchronous task submission mechanism. Due to changes in recent CANN versions, tiling time has significantly decreased. Even with a dual-thread submission model, the dispatch overhead still falls on the critical path, making async submission less beneficial. Moreover, aclGraph support provides a much better path to reducing operator dispatch latency. * CANN: resolve review comments	2025-11-17 08:43:59 +08:00
Pavels Zaicenkovs	dbed61294a	vulkan: add LOG operation support for F32 and F16 (#17183 ) * vulkan: add LOG operation support for F32 and F16 Part of #14909. * vulkan: Fix LOG operation types * docs: Update operation support documentation for Vulkan LOG operation * vulkan: fix log_f16 shader * docs: restore missing LOG test cases and regenerate ops.md	2025-11-16 22:50:09 +01:00
Ruben Ortlam	80deff3648	vulkan: fix MMQ quantize_y condition (#17301 )	2025-11-16 19:38:17 +01:00
Eve	8b1c339bd2	ci : revert #16249 (#17303 ) * Delete .github/workflows/build-amd.yml * Update build.yml	2025-11-16 19:09:17 +01:00
Georgi Gerganov	416e7c7f47	metal : remove obosolete asserts (#17295 )	2025-11-16 09:50:26 +02:00
Georgi Gerganov	5b2093becc	server : handle context overflow during decode (#17267 ) * server : handle context overflow during decode * server : minor refactor	2025-11-16 09:23:37 +02:00
lhez	52e5d421f1	opencl: fix rms_norm_mul (#17250 ) * opencl: use subgrroup reduce for reduction in rms_norm_mul * opencl: add comment about workgroup size	2025-11-15 17:40:14 -08:00
shaofeiqi	4db5641210	opencl: add kernel to handle mat mul in attention to improve encoding speed (#17181 ) * Add mul_mm_f16_f32_kq_kqv kernel * Add ggml_cl_mul_mat_kq_kqv_adreno func * fix whitespace * remove unused variable * remove redundant * refactor and clean up * remove trailing whitespace	2025-11-15 17:33:10 -08:00
shani-f	72bd7321a7	sycl : unify unary kernels with a generic implementation and enable wide operator support (#17213 ) * SYCL: add generic unary op implementation for multiple ops (ABS/SGN/…); unify non-contiguous access * SYCL: update documentation and sycl.csv to reflect new unary op support * update ops.md after syncing SYCL.csv changes * Fix SYCL.csv merge conflict * Update ops.md after fixing SYCL.csv conflicts * Fix SYCL.csv tail after merge conflict and regenerate ops.md * Fix line endings and final newline in SYCL.csv * Remove TOPK_MOE entries from SYCL.csv as requested * Update ops.md after removing TOPK_MOE from SYCL.csv * Regenerated SYCL.csv and synced ops.md with upstream * Update ops.md using create_ops_docs.py	2025-11-16 00:52:42 +01:00
Aleksander Grygier	22e1ce2f81	webui: Fix clickability around chat processing statistics UI (#17278 ) * fix: Better pointer events handling in chat processing info elements * chore: update webui build output	2025-11-15 22:41:41 +01:00
Pascal	1411d9275a	webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI (#16618 ) * webui: add OAI-Compat Harmony tool-call live streaming visualization and persistence in chat UI - Purely visual and diagnostic change, no effect on model context, prompt construction, or inference behavior - Captured assistant tool call payloads during streaming and non-streaming completions, and persisted them in chat state and storage for downstream use - Exposed parsed tool call labels beneath the assistant's model info line with graceful fallback when parsing fails - Added tool call badges beneath assistant responses that expose JSON tooltips and copy their payloads when clicked, matching the existing model badge styling - Added a user-facing setting to toggle tool call visibility to the Developer settings section directly under the model selector option * webui: remove scroll listener causing unnecessary layout updates (model selector) * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * Update tools/server/webui/src/lib/components/app/chat/ChatMessages/ChatMessageAssistant.svelte Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: npm run format & update webui build output * chore: update webui build output --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com>	2025-11-15 21:09:32 +01:00
Sigbjørn Skjæret	662192e1dc	convert : remove unnecessary chat template patching (#17289 )	2025-11-15 20:58:59 +01:00
Jeff Bolz	24dc769f1b	vulkan: Fuse mul_mat_id+add_id+mul and mul_mat+add+add. (#17287 ) These both show up in gpt-oss. Also, cleanup the mul_mat_vec fusion code a bit.	2025-11-15 19:54:23 +01:00
Ruben Ortlam	4dca015b7e	vulkan: Replace 16-bit unpack8 calls to work around legacy Windows AMD driver bug (#17285 )	2025-11-15 15:18:58 +01:00
Sigbjørn Skjæret	9a8860cf5d	convert : use all parts in safetensors index (#17286 )	2025-11-15 14:12:39 +01:00
Sigbjørn Skjæret	9d3ef4809f	convert : set expert gating func in base class (#17279 )	2025-11-15 14:06:24 +01:00
Ankur Verma	c7b7db0445	mtmd-cli: Avoid logging to stdout for model loading messages in mtmd-cli (#17277 )	2025-11-15 12:41:16 +01:00
Giuseppe Scrivano	1568d13c2c	vulkan: implement ABS and NEG (#17245 ) * docs: update Vulkan ops * vulkan: add NEG op * vulkan: add ABS op --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>	2025-11-15 12:00:29 +01:00
Jeff Bolz	439342ea0b	vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths (#17244 ) * vulkan: Use ggml_vk_tensor_subbuffer in mul_mat_vec(id) paths * set allow_misalign	2025-11-15 11:56:15 +01:00
Jeff Bolz	234ae7d7bd	vulkan: skip all-negative-inf blocks in FA (#17186 )	2025-11-15 10:37:25 +01:00
Jeff Bolz	38eaf32af1	vulkan: change graph_compute to be async and enable get_tensor_async (#17158 ) * vulkan: change graph_compute to be async and enable get_tensor_async This allows some additional CPU/GPU overlap for large pp workloads. Also seems to help a bit for token gen, maybe getting rid of a small bubble between graph_compute and get_tensor. Async set and copy functions seem to be very rarely used, so I didn't enable them because I didn't have a good way to test them. The async commands need to be ordered against each other, so put them all on the compute queue. The non-async commands still use the transfer queue. The fence for graph_compute/get_tensor_async is submitted and waited on in ggml_vk_synchronize. * fix thread safety errors * teardown context cleanly * Handle async read to non-pinned dst	2025-11-15 09:06:41 +01:00
Xuan-Son Nguyen	9b17d74ab7	mtmd: add mtmd_log_set (#17268 )	2025-11-14 15:56:19 +01:00
Bartowski	e1fcf8b09b	model : add AfmoeForCausalLM support (#16477 ) * Add AFMOE model support * Update to vocab * Add model sizing * Undo Rope change for ARCEE model * Address review comments * Update modeling code is_sliding -> use_rope, replace hard-coded logic * Fix AFMOE tokenizer * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update convert_hf_to_gguf.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update AFMoE tokenizer class identification to be more unique --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>	2025-11-14 13:54:10 +01:00
Marek Hradil jr.	6cd0cf72ce	fix : Dangling pointer for non-empty trigger words in lazy grammar construction (#17048 ) * fix : Dangling pointer for non-empty trigger words in llama_sampler_init_grammar_impl (#17047) * Replace 'static' workaround, with keeping variable in scope for longer * Create std::array directly and pass into llama_grammar_init_impl * Add back the trigger pattern * Missed array include	2025-11-14 14:35:26 +02:00
Georgi Gerganov	d396b43748	server : fix "can batch with" bug (#17263 )	2025-11-14 14:03:45 +02:00

1 2 3 4 5 ...

7108 Commits All Branches Search

7108 Commits

All Branches