* ggml_qnn_op_config now manager the construction of ggml_qnn_tensor
* wip
* add interface ggml_qnn_op_config
* add ggml_qnn_list_op_config
* add create_tensor and move tensor bind to execute
* wip
* rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config
* add tensortype to allow native tensor
* remove ggml_tensor param at ggml_qnn_tensor::create_tensor
* postpone the tensor id allocation to add_node
* add ggml_qnn_op_config_base
* trival change to reduct the param of function
* split bind_tensors into bind_input_tensors and bind_output_tensors
* implement ggml_qnn_single_op_config::create_tensors
next will set the prameter of transpose
* tensor: add bind buffer
* add parameter tensor type
* implement add_tensor_param
* set qnn_instance only at constructor
* set transpose tensor param
* move create_op_constructor into op-config module
* create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config
* try fix crash
* fix compiling error at older ndk (r23c)
* fix crash
* fix parameter tensor name
* update tensor dimension assignment and add TODO
* fix mat_mul graph creating
* fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1
* append type to graph cache key
* wip
* fix supported op
* update comment
* disable op other than add and mat_mul
* add convert op to adapt multi input/output format
* disable f16 for cpu backend according to official doc
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations
* add supported data types flags in each backend
* remove unused functions
* append output type to graph key
* fix gpu backend by disable the different data type op
* fix cpu backend support ops
* fix duplicated tensor name
* append op name
* suppress warning
* remove unused code
* ggml : do not use BLAS with types without to_float
* ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies
* ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits
it's not really internal if everybody uses it
* docs : clarify building Android on Termux
* docs : update building Android on Termux
* docs : add cross-compiling for Android
* cmake : link dl explicitly for Android
* ggml : add metal backend registry / device
ggml-ci
* metal : fix names [no ci]
* metal : global registry and device instances
ggml-ci
* cont : alternative initialization of global objects
ggml-ci
* llama : adapt to backend changes
ggml-ci
* fixes
* metal : fix indent
* metal : fix build when MTLGPUFamilyApple3 is not available
ggml-ci
* fix merge
* metal : avoid unnecessary singleton accesses
ggml-ci
* metal : minor fix [no ci]
* metal : g_state -> g_ggml_ctx_dev_main [no ci]
* metal : avoid reference of device context in the backend context
ggml-ci
* metal : minor [no ci]
* metal : fix maxTransferRate check
* metal : remove transfer rate stuff
---------
Co-authored-by: slaren <slarengh@gmail.com>
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m
* Moving Block_release to the deallocation code
* Release encode block when re-setting encoding buffer count if needed
* Update ggml/src/ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* rerank : use [SEP] token instead of [BOS]
ggml-ci
* common : sanity check for non-NULL tokens
ggml-ci
* ci : adjust rank score interval
ggml-ci
* ci : add shebang to run.sh
ggml-ci
* Add scaffolding for ggml logging macros
* Metal backend now uses GGML logging
* Cuda backend now uses GGML logging
* Cann backend now uses GGML logging
* Add enum tag to parameters
* Use C memory allocation funcs
* Fix compile error
* Use GGML_LOG instead of GGML_PRINT
* Rename llama_state to llama_logger_state
* Prevent null format string
* Fix whitespace
* Remove log callbacks from ggml backends
* Remove cuda log statement
* vulkan : do not use tensor->extra
This patch allows using the Vulkan backend with the RPC backend as
tensor->extra is no longer used.
Ref: #8536
* Adapt GGML_VULKAN_CHECK_RESULTS to extra removal (#2)
---------
Co-authored-by: 0cc4m <picard12@live.de>
* make sure params --split and --merge are not specified at same time
* update gguf-split params parse logic
* Update examples/gguf-split/gguf-split.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>