Commit Graph

17 Commits

Author SHA1 Message Date
Sam/Samuel f4ce81c45e
metal: optimise `GGML_OP_SUM` (#16559)
* optimise GGML_OP_SUM

* add non-contiguous tests by permuting the input

* change tests to require full contiguity of OP_SUM

* cuda : add check GGML_OP_SUM

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-10-15 17:05:56 +03:00
Georgi Gerganov fa882fd2b1
metal : avoid using Metal's gpuAddress property (#16576)
* metal : avoid using Metal's gpuAddress property

* metal : fix rope kernels buffer check
2025-10-14 20:33:05 +03:00
Sam/Samuel 3f750f8d76
metal: add support for opt_step_sgd (#16539)
* metal: add support for opt_step_sgd

* add newline to pass EditorConfig check
2025-10-13 11:25:02 +03:00
Sam/Samuel a31cf36ad9
metal : add opt_step_adamw and op_sum (#16529)
* scaffold to support opt step adamw on metal (not written so far)

* add opt-step-adamw kernel for metal

* pass op->src[4] as a separate buffer to the pipeline

* add bounds check to opt-step-adamw kernel

* complete scaffold for GGML_OP_SUM

* naive GGML_OP_SUM kernel

* remove unwanted comment

* change OP_SUM capability gate

* Add has_simdgroup_reduction to both ops to pass CI
2025-10-12 21:43:14 +03:00
Georgi Gerganov a3cb04744f
metal : fix mul-mm condition + fix mul-mv permuted kernels (#16494) 2025-10-11 16:54:10 +03:00
Georgi Gerganov b2c08c9ec4
metal : mark FA blocks (#16372)
* metal : better unroll in the FA kernels

* metal : index FA blocks

* tests : restore [no ci]

* metal : prevent division by zero in FA kernels

* metal : fix -INF detection logic
2025-10-08 10:57:53 +03:00
Georgi Gerganov 0a319bb75e
metal : add support for non-padded FA KV (#16148)
* metal : pad K, V and Mask when needed

* cont : simplify

* cuda : add TODO about KV padding requirement

* metal : add comments

* metal : remove mask padding requirement
2025-10-07 08:23:30 +03:00
Georgi Gerganov 8ae32dc9ec
metal : various optimizations + refactoring (#16446)
* metal : ssm_scan minor opts

* metal : get_rows optimize

* metal : cpy optimize

* metal : ssm_conv opt

* metal : ssm_scan simplify

* metal : ssm_Scan opt
2025-10-07 08:21:40 +03:00
Georgi Gerganov 35fb82497e
metal : dynamic simdgroups for MV kernels (#16340)
* metal : dynamic simdgroups for MV kernels

* cont : minor
2025-09-30 11:03:23 +03:00
Georgi Gerganov 6a2c6145a0
metal : extend mat-mat multiplication support (#16225)
* metal : support mul_mm with src1->type == GGML_TYPE_F16

* metal : support mul_mm_id with src1->type == GGML_TYPE_F16

[no ci]

* metal : mul_mm support ne00 % 32 != 0

* metal : support mul_mm_id with ne00 % 32 != 0

* cont : remove unnecessary unrolls

* cont : simplify data loading

* metal : optimize mul_mm when output bounds checks are not needed
2025-09-28 09:34:44 +03:00
Georgi Gerganov 3b53634fe3
metal : fuse non-sequential nodes (#16102)
* metal : fuse non-sequential nodes

* cont : add comment

* cont : simplify bounds checks
2025-09-28 09:34:05 +03:00
Georgi Gerganov dfcd53f7ec
metal : fuse NORM + MUL + ADD, support non-multiples of 4 (#16220)
* metal : fuse NORM + MUL + ADD

* metal : support norms of non-multiple of 4

* cont : fix comment [no ci]
2025-09-25 11:30:16 +03:00
Georgi Gerganov 02a6a82ae7
metal : restore im2col perf (#16219) 2025-09-25 11:29:08 +03:00
Sigbjørn Skjæret 3ecb2f671a
ggml : implement set_rows with i32 index (#16159)
* implement set_rows with i32 index

* template fix

* test quantized path

warnings--

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* forgotten name change

* deduplicate cuda/sycl and test-fix

* indent++

* vulkan: support set_rows with i32 index type (#16162)

* disable i32 index for webgpu for now

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2025-09-22 19:13:00 +02:00
Georgi Gerganov 703f9e32c4
metal : use function constants for mul_mv_ext kernels (#16074)
* metal : use function constants for mul_mv_ext kernels

ggml-ci

* metal : remove NW template argument

ggml-ci

* metal : adjust constants

ggml-ci
2025-09-18 16:28:41 +03:00
Georgi Gerganov b213fce89b
metal : improve F32, F16 and BF16 mat-vec multiplication (#16057)
* metal : improve F32, F16 and BF16 mat-vec multiplication

ggml-ci

* metal : make the NSG a function constant in mul_mv kernels

ggml-ci
2025-09-18 12:33:45 +03:00
Georgi Gerganov 0320ac5264
metal : refactor + optimize v2 (#15995)
* metal : improve naming

* metal : refactor device

ggml-ci

* cont : props

ggml-ci

* metal : apply ggml_mem_ranges_t

ggml-ci

* metal : remove GGML_METAL_USE_BF16

ggml-ci

* metal : refactor device buffer

ggml-ci

* cont : fix naming

* metal : sync before destroying the backend

ggml-ci

* metal : refactor context

ggml-ci

* metal : migrate ggml-metal.m to ggml-metal.cpp

ggml-ci

* metal : adjust ops API

ggml-ci

* metal : use C++ to store piplienes

ggml-ci

* metal : migrate ops to separate functions

ggml-ci

* metal : add ggml_metal_library_t

ggml-ci

* metal : improve naming

ggml-ci

* metal : cleanp

ggml-ci

* metal : add support for GGML_OP_LOG

ggml-ci

* metal : fix error handling

ggml-ci
2025-09-17 20:38:12 +03:00