llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	3b3a948134	metal : update sum_rows kernel to support float4 (#19524 )	2026-02-12 11:35:28 +02:00
Georgi Gerganov	914dde72ba	ggml : unary ops support non-cont src0 + metal F16 unary ops (#19511 ) * ggml : unary ops support non-cont src0 * metal : support F16 unary ops + fix ELU	2026-02-11 18:58:43 +02:00
Georgi Gerganov	9ab072ebbe	metal : extend l2_norm support for non-cont src0 (#19502 )	2026-02-11 14:53:19 +02:00
Georgi Gerganov	ceaa89b786	metal : consolidate unary ops (#19490 )	2026-02-11 07:51:12 +02:00
Georgi Gerganov	8872ad2125	metal : consolidate bin kernels (#19390 ) * metal : refactor bin kernels * cont * cont : fix cv	2026-02-07 10:35:56 +02:00
Georgi Gerganov	34ba7b5a2f	metal : fix event synchronization in cpy_tensor_async (#19402 )	2026-02-07 07:37:15 +02:00
Georgi Gerganov	7fcf1ef45d	metal : skip loading all-zero mask (#19337 ) * metal : skip loading all-zero mask * cont : minor	2026-02-06 09:25:11 +02:00
Georgi Gerganov	22cae83218	metal : adaptive CPU/GPU interleave based on number of nodes (#19369 )	2026-02-05 19:07:22 +02:00
Georgi Gerganov	7a4f97d196	metal : add diag (#19330 )	2026-02-05 10:08:45 +02:00
will-lms	af252d0758	metal : add missing includes (#19348 )	2026-02-05 08:05:09 +02:00
Georgi Gerganov	44008ce8f9	metal : add solve_tri (#19302 )	2026-02-03 23:43:14 +02:00
Georgi Gerganov	c55bce4159	metal : minor cleanup (#19251 )	2026-02-03 13:43:29 +02:00
Georgi Gerganov	6fdddb4987	metal : support virtual devices (#18919 ) * metal : support virtual devices * cont : manage buffer type context memory * metal : add events * cont : implement cpy_tensor_async	2026-02-02 14:29:44 +02:00
Christian Kastner	7a4ca3cbd9	docs : Minor cleanups (#19252 ) * Update old URLs to github.com/ggml-org/ * Bump copyrights	2026-02-02 08:38:55 +02:00
ccbinn	0440bfd160	metal : fix recommendedMaxWorkingSetSize availability on legacy iOS/macOS (#19088 ) Co-authored-by: chenbin11 <chenbin11@kuaishou.com>	2026-01-25 20:07:19 +02:00
Georgi Gerganov	271191906c	metal : enable FA for MLA heads (#18950 )	2026-01-20 12:21:28 +02:00
Georgi Gerganov	365a3e8c31	ggml : add ggml_build_forward_select (#18550 ) * ggml : add ggml_build_forward_select * cuda : adapt CUDA graph compat to new feature * vulkan : update logic to handle command buffer closing * ggml : check compute for fusion * ggml : add comment	2026-01-19 20:03:19 +02:00
Thore Koritzius	388ce82241	ggml : extend ggml_pool_1d + metal (#16429 ) * chore: resolve conflicts * feat: ggml metal impl * fix: ggml_metal_kargs_pool_1d struct * fix: require contiguous input * chore: test pool_1d * chore: limit pool1d test cases to p0=0 and s0=k0 to conform with asserts * chore: add p0 and s0 to testing * fix: allow padding for cpu and metal * Update ggml/src/ggml-metal/ggml-metal.metal * fix: correct single-threaded loop * ggml : cleanup * tests : add ne[1] != 1 tests * fix: ne[1] handling in np * cont : fixes --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2026-01-16 16:59:56 +02:00
Perry Naseck	7d587e5544	ggml-metal: do not copy headers for embedded, use current binary dir for embedded (#18705 )	2026-01-14 09:22:25 +02:00
도로로도로또	945bf10627	metal : add MoE kernel specialization for ne20=5 (#18667 ) Add template specialization for kernel_mul_mm_id_map0 with ne20=5 to support models using 5 active experts (e.g., VAETKI).	2026-01-08 12:37:45 +02:00
Doctor Shotgun	9a5724dee2	ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535 ) * ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH * makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32 * ggml: read GGML_OP_OFFLOAD_MIN_BATCH once and store to dev ctx * cann: forward declaration of device context struct * cann: move offload op check after device context declaration * cuda: fix whitespace Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>	2026-01-08 11:03:21 +02:00
Georgi Gerganov	f38de16341	metal : adjust extra size for FA buffer to avoid reallocations (#18545 )	2026-01-02 19:02:18 +02:00
gatbontonpc	9a6369bb60	metal : add count_equal op (#18314 ) * add count equal for metal * remove trailing whitespace * updated doc ops table * changed shmem to i32 * added multi tg and templating * removed BLAS support from Metal docs * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add memset to set dst to 0 * metal : cleanup --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-31 10:39:48 +02:00
Georgi Gerganov	01ade96e71	metal : remove BF16 x F16 kernels (#18456 )	2025-12-31 09:53:48 +02:00
Jeremy Demeule	165caaf5fb	metal: use shared buffers on eGPU (#17866 ) * metal: use shared buffers on eGPU With #15906, I noticed on important regression when using metal backend on eGPU. This commit restore the previous behavior and add an option to force its activation. * metal: use shared buffers on eGPU * metal: use shared buffers on eGPU	2025-12-15 16:14:49 +02:00
Gabe Goodhart	086a63e3a5	metal: SSM kernel improvements (#17876 ) * feat: Add a batched version of ssm_conv This was done using Claude Code. It found a number of optimizations around how the threads were organized, resulting in a huge performance boost! Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Optimized SSM_SCAN kernel for metal This used Claude Code and resulted in a modest performance improvement while maintaining correctness. Branch: Mamba2SSD Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * test: Add test-backend-ops perf tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * test: Real representitive tests for SSM_CONV Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use function constant for ssm_conv batch size Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * test: backend op tests for ssm_scan from granite4 1b-h Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * style: remove commented out templates Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: float4 version of ssm_conv_batched Branch: SSMKernelImprovements Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Add missing ggml_metal_cv_free Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-09 21:30:02 +02:00
Georgi Gerganov	6b82eb7883	metal : print node names for debugging (#17882 )	2025-12-09 15:25:49 +02:00
Phylliida Dev	09c7c50e64	ggml : add circular tiling support to pad, for Vulkan, CUDA, and CPU (used for making seamless textures) (#16985 ) * Feat: Added vulkan circular tiling support * Feat: Added cpu circular * Feat: Added cuda kernels * Added tests * Added tests * Removed non-pad operations * Removed unneded changes * removed backend non pad tests * Update test-backend-ops.cpp * Fixed comment on pad test * removed trailing whitespace * Removed unneded test in test-backend-ops * Removed removed test from calls * Update ggml/src/ggml-vulkan/vulkan-shaders/pad.comp Co-authored-by: Ruben Ortlam <picard12@live.de> * Fixed alignment * Formatting Co-authored-by: Aman Gupta <amangupta052@gmail.com> * Format pad * Format * Clang format * format * format * don't change so much stuff * clang format and update to bool * fix duplicates * don't need to fix the padding * make circular bool * duplicate again * rename vulkan to wrap around * Don't need indent * moved to const expr * removed unneded extra line break * More readable method calls * Minor wording changes * Added final newline * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/include/ggml.h Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Added circular pad ext tests * Gate non circular pad devices * Cleaned gating of non-circular pad devices --------- Co-authored-by: Phylliida <phylliidadev@gmail.com> Co-authored-by: Ruben Ortlam <picard12@live.de> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-06 15:07:02 +01:00
Georgi Gerganov	8ce774a102	metal : fix build(#17799 ) * metal : fix build * tests : fix context destruction	2025-12-06 09:33:59 +02:00
Georgi Gerganov	c41bde6fbd	metal : add residency sets keep-alive heartbeat (#17766 ) * examples : add idle * metal : attach residency sets to queue * idle : add link * idle : adjust intervals * metal : add residency sets keep-alive heartbeat * cont : adjust default keep-alive time	2025-12-05 19:38:54 +02:00
Gabe Goodhart	bde188d60f	metal: TRI, FILL, EXPM1, SOFTPLUS (#16623 ) * feat(wip): Port initial TRI impl from pervious work The kernel does not work and is not optimized, but the code compiles and runs, so this will be the starting point now that the core op has been merged. Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Remove argument for constant val override This was added in the original draft, but later removed. With this, the kernel now passes tests. Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Move the ttype conditional to templating to avoid conditional in kernel Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Type fixes Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * feat: Add softplus for metal Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add EXPM1 for metal Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add FILL for metal Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Branchless version of tri using _ggml_vec_tri_cmp as a mask Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Remove unused arguments Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use select instead of branch for softplus non-vec Branch: ggml-cumsum-tri Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-12-04 19:12:19 +02:00
Georgi Gerganov	0d1324856f	metal : use params per pipeline instance (#17739 )	2025-12-04 10:34:11 +02:00
Georgi Gerganov	3d94e967a1	metal : fix data race in pipeline library (#17731 )	2025-12-03 14:03:40 +02:00
Georgi Gerganov	649495c9d9	metal : add FA head size 48 (#17619 )	2025-12-01 12:49:53 +02:00
Tarek Dakhran	2ba719519d	model: LFM2-VL fixes (#17577 ) * Adjust to pytorch * Add antialiasing upscale * Increase number of patches to 1024 * Handle default marker insertion for LFM2 * Switch to flag * Reformat * Cuda implementation of antialias kernel * Change placement in ops.cpp * consistent float literals * Pad only for LFM2 * Address PR feedback * Rollback default marker placement changes * Fallback to CPU implementation for antialias implementation of upscale	2025-11-30 21:57:31 +01:00
Georgi Gerganov	583cb83416	ggml : add ggml_top_k (#17365 ) * ggml : add ggml_top_k * cont : add ggml_argsort_top_k * metal : add top_k support * ggml : cleanup * tests : add virtual err() function for test_case * ggml : add comments	2025-11-25 15:31:43 +02:00
YangLe	1d321e592b	metal : fix compile on macos 11 (whisper/3533)	2025-11-20 14:10:44 +02:00
Georgi Gerganov	7aaeedc098	metal : support I32 -> I32 copy (#17317 )	2025-11-17 11:52:00 +02:00
Georgi Gerganov	3347e6d904	metal : faster argsort (#17315 ) * metal : faster argsort * cont : keep data in registers	2025-11-17 11:51:48 +02:00
Georgi Gerganov	1a139644a8	metal : add cumsum (#17305 )	2025-11-17 11:51:13 +02:00
Georgi Gerganov	416e7c7f47	metal : remove obosolete asserts (#17295 )	2025-11-16 09:50:26 +02:00
Georgi Gerganov	45c6ef7307	metal : support argsort for ne00 > 1024 (#17247 ) * metal : refactor argsort * cont : sort chunks * cont : merge sorted buckets * cont : cleanup	2025-11-14 09:36:06 +02:00
Georgi Gerganov	2606b0adab	metal : make the FA extra sizes consistent (#17143 )	2025-11-14 09:13:34 +02:00
bagheera	0cfb19166b	metal: accelerated conv2d (#17175 ) * metal: accelerated conv2d * cont : cleanup --------- Co-authored-by: bghira <bghira@users.github.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-11-13 13:32:44 +02:00
Georgi Gerganov	13730c183b	metal : cap threadgroups size of set_rows (#17146 )	2025-11-10 21:33:35 +02:00
Georgi Gerganov	c27efd2bd1	metal : enable tensor API for A19 (#17087 )	2025-11-10 15:38:42 +02:00
Georgi Gerganov	0750a59903	metal : retain src and dst buffers during async ops (#17101 )	2025-11-09 08:28:51 +02:00
Georgi Gerganov	5b180c3d60	metal : initial Metal4 tensor API support (#16634 ) * metal : rework mat-mat multiplication * metal : initial Metal4 support * cont * metal : detect tensor support * cont : better ifdefs * metal : support tensors in mul_mm_id * metal : add env for disabling tensor API * tests : restore * metal : remove unused constants * metal : fix check for bfloat tensor support * cont : handle API incompatibilities * cont : handle even more incompatibilities * metal : use tensor API only on M5 and later	2025-11-06 14:45:10 +02:00
Georgi Gerganov	2f966b8ed8	clip : use FA (#16837 ) * clip : use FA * cont : add warning about unsupported ops * implement "auto" mode for clip flash attn * clip : print more detailed op support info during warmup * cont : remove obsolete comment [no ci] * improve debugging message * trailing space * metal : remove stray return --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-11-02 21:21:48 +01:00
Ruben Ortlam	d2a2673dd1	vulkan: fix shmem overrun in mmq id shader (#16873 ) * vulkan: fix shmem overrun in mmq id shader * metal : fix mul_mm_id --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-10-31 08:14:49 +01:00

1 2 3 4

174 Commits