* redo: add convert nodes
This reverts commit 8448acd5ebf8fe86ab9d25313b64a15c811ef96e.
* align clang format with cann
* rename binary_op -> general_op
casue there're some op that will only tak 1 param
* Revert "rename binary_op -> general_op"
This reverts commit 5be63b1a0dc4614457785367dade62158fe46214.
* wip
* add GGML_OP_PERMUTE
* add GGML_OP_VIEW and GGML_OP_GET_ROWS
* wip
* Revert "wip"
This reverts commit 772462ca6cfa01ea31bde725c2da60076ad9385f.
* llama : fix buffer checks for mamba and rwk
* llama : fix missing worst case flag during reserve
* cuda : fix supports_op for norm
* disable sched SET_CAUSE
* ggml : fix gguf string leak when reading kv pairs fails
* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type
* ggml : avoid crashing on failed memory allocations when loading a gguf file
* ggml: Add POOL2D OP for GPU ACC to the Vulkan.
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Correct the incorrect order of the parameters.
fix casting to int.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* ggml_qnn_op_config now manager the construction of ggml_qnn_tensor
* wip
* add interface ggml_qnn_op_config
* add ggml_qnn_list_op_config
* add create_tensor and move tensor bind to execute
* wip
* rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config
* add tensortype to allow native tensor
* remove ggml_tensor param at ggml_qnn_tensor::create_tensor
* postpone the tensor id allocation to add_node
* add ggml_qnn_op_config_base
* trival change to reduct the param of function
* split bind_tensors into bind_input_tensors and bind_output_tensors
* implement ggml_qnn_single_op_config::create_tensors
next will set the prameter of transpose
* tensor: add bind buffer
* add parameter tensor type
* implement add_tensor_param
* set qnn_instance only at constructor
* set transpose tensor param
* move create_op_constructor into op-config module
* create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config
* try fix crash
* fix compiling error at older ndk (r23c)
* fix crash
* fix parameter tensor name
* update tensor dimension assignment and add TODO
* fix mat_mul graph creating
* fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1
* append type to graph cache key
* wip
* fix supported op
* update comment
* disable op other than add and mat_mul
* add convert op to adapt multi input/output format
* disable f16 for cpu backend according to official doc
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations
* add supported data types flags in each backend
* remove unused functions
* append output type to graph key
* fix gpu backend by disable the different data type op
* fix cpu backend support ops
* fix duplicated tensor name
* append op name
* suppress warning
* remove unused code
* metal : support permuted matrix multiplicaions
ggml-ci
* cont : use nb01 directly for row steps
ggml-ci
* cont : add comments [no ci]
* metal : minor refactor
* metal : minor
This commit removes the setting of the `used` field of the contexts in
the global state (g_state) in `ggml_init`.
The motivation for this change is that I believe that this additional
initialization might not be required after the changes in Commit
45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from
whisper.cpp"), which changed the initialization of the contexts field
from `{ 0 }` to `{ { 0 } }`:
```console
g_state = (struct ggml_state) {
- /*.contexts =*/ { 0 },
+ /*.contexts =*/ { { 0 } },
};
```
My understanding is that the `{0}` initialization might not have
zero-initialized all the nested fields in every array element because of
compiler differences, and might have been the reason for having the
explicit setting of the `used` fields to false.
* [CANN] Adapt to dynamically loadable backends mechanism
* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class
* Handle the review comments of this pull request
add intel amx isa detection
add vnni kernel for gemv cases
add vnni and amx kernel support for block_q8_0
code cleanup
fix packing B issue
enable openmp
fine tune amx kernel
switch to aten parallel pattern
add error message for nested parallelism
code cleanup
add f16 support in ggml-amx
add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS
update CMakeList
update README
fix some compilation warning
fix compiler warning when amx is not enabled
minor change
ggml-ci
move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp
ggml-ci
update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16
ggml-ci
add amx as an ggml-backend
update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h
minor change
update CMakeLists.txt
minor change
apply weight prepacking in set_tensor method in ggml-backend
fix compile error
ggml-ci
minor change
ggml-ci
update CMakeLists.txt
ggml-ci
add march dependency
minor change
ggml-ci
change ggml_backend_buffer_is_host to return false for amx backend
ggml-ci
fix supports_op
use device reg for AMX backend
ggml-ci
minor change
ggml-ci
minor change
fix rebase
set .buffer_from_host_ptr to be false for AMX backend
* fix: use `vm_allocate` to allocate CPU backend buffer on macOS
* fix: switch to `posix_memalign` to keep existing `free()` usages work
* feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS
* style: formatting
* fix: move const outside of `#ifndef`
* style: formatting
* fix: unused var
* fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h`
* fix: unused var
* fix: page align to `GGUF_DEFAULT_ALIGNMENT`
* fix: page align to `TENSOR_ALIGNMENT`
* fix: convert `TENSOR_ALIGNMENT` to a macro
* fix: increase page size to `32` on iOS
* fix: iOS page size
* fix: `hbw_posix_memalign` alignment
This commit removes the buffer_id field from the leaf_alloc struct.
The motivation for is that this field is only written to and never
read/used as far as I can tell. Each tensor_alloc has a buffer_id field
and this is what caused me to look into this more closely, to
understand what the buffer_id in leaf_alloc was used for.