* move op key generate function to kOpCaps
* fix op desc print
* try fix rms_norm
* Revert "try fix rms_norm"
This reverts commit 33b296098012909cb482fc29b52b28098dc971cd.
* add quantization type support by converting them to float
* enable quantization tensor for mulmat in gpu/npu
* fix asan error
* add log and assert
* insert output convert operator after mulmat
* add log
* fix some error in running
* disable permute again
* add log
* add error function
* Revert "add error function"
This reverts commit f92ff47798ac8053fb776c55efbb1a98469c7af1.
* add log
* more log
* disable convert op in graph
* wip
* add f16 config for graph
* set f16 precision for f16 graph
* fix override data type
* add comment
* add config flag to enable quantize type
* add log
* more quantized type for cpu and gpu backend
* enable all quant types for cpu and gpu backend
* rename
* wip
* add log
* remove unused functions
* skip permute
* remove get_qnn_op_input_param_count
* fallback to generic_get_op_desc if no op_desc
* revert 'skip permute'
* Revert "revert 'skip permute'"
This reverts commit 5761e31fd23c69c4cabf6fd9fac1a0d3e5a74968.
* wip
* add log
* print qnn tensor type
* add log
* limit the max size of tensor
* add log
* fix tensor size limiter
* small improve on tensor info printer
* disable sqrt and div to pass test-backend-ops for 8 gen 2
* remove debug log in release build
* add log
* skip permute in src
* wip
* disable reshape
* skip mul at decoder start
* wip
* add log
* add qnn_scoped_timer
* add perf tracker in graph
* add cmake options GGML_QNN_ENABLE_PERFORMANCE_TRACKING
* fix flag name
* use milli-second
* wip
* fix comment string
* add file for profiler
* change qnn-cpu to GGML_BACKEND_DEVICE_TYPE_ACCEL, so that we can run tests on cpu
* wip
* profiler: refactoring
* wip
* add implement for print_profile_events
* set-up profiler for graph
* set profiler to graph execute
* pretty print events
* unified log print prefix
* print event count
* enable optrace
* print duration at event end
* wip
* add more detailed soc information
* wip
* move device caps array into qnn-lib.cpp
* remove lib_name in device_context
* move get_graph_key_from_cgraph to graph.cpp
* add override type for tensor key
* use override_type instead of original data type for graph key
* append op type to tensor name to fix error in qwen
* remove todo
* wip
* Fix DOS index bug
* Remove new APIs
* remove extra line
* Remove from API
* Add extra newline
* Update examples/server/server.cpp
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64
refactor mmqv to unify the calculation of nwarps and rows per block between host and device code.
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
This patch nudges the llama.cpp a bit to be supported on PoCL which
doesn't support OpenCL C CL2.0. The issue is solved by querying the
device for the supported OpenCL C versions and using the highest one
available.
As discussed in PR 'llama-tts : add -o option' (#12042):
* common_params : 'out_file' string is the only output file name parameter left in common_params. It's intended to be used in all example programs implementing an '-o' option.
* cvector-generator, export-lora, imatrix : default output filenames moved from 'common_params' to the 'main()' of each example program.
This commit updates the compilation of default.metallib to skip the
intermediate .air (Apple Intermediate Representation) file.
The motivation for this change is to simplify the custom command a
little and avoid generating and then removing the .air file.
* ggml_compute_forward_concat() for arbitrary tensor type
* Check that tensors' type match
* ggml-cpu.c: check type of source tensors
* ggml-cpu.c: move tensor type check to ggml_compute_forward_concat()
* ggml.c: check concatenated tensor type
* Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c
..., as it was moved to ggml.c.
* metal : refactor im2col parameters into a struct
* metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets
* metal : refactor sum_rows parameters into a struct
* metal : refactor soft_max parameters into a struct
* metal : refactor diag_mask_inf parameters into a struct
* metal : refactor ssm_conv parameters into a struct
* metal : refactor ssm_scan parameters into a struct
* metal : refactor get_rows parameters into a struct
* metal : refactor group_norm parameters into a struct
* metal : refactor conv_transpose_1d parameters into a struct
* metal : refactor upscale parameters into a struct
* metal : refactor pad parameters into a struct
* metal : refactor pad_reflect_1d parameters into a struct
* metal : refactor arange parameters into a struct
* metal : refactor timestep_embedding parameters into a struct
* metal : refactor argsort parameters into a struct
* metal : refactor leaky_relu parameters into a struct
* metal : refactor pool_2d parameters into a struct
* metal : fix trailing whitespace
---------
Co-authored-by: alexju <alexju@tencent.com>
This commit updates the custom command to build the default.metallib
file to use the correct path to ../ggml-common.h by using the variable
METALLIB_COMMON.
The motivation for this change is that currently when building and
specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is
generated:
```console
[ 11%] Linking CXX shared library ../../bin/libggml.dylib
[ 11%] Built target ggml
make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'. Stop.
make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2
```
With the above change the build could progress but there was a follow
on error about not being able to find the ggml-common.h file in
ggml-metal.metal where is was included as a relative path:
```console
[ 11%] Compiling Metal kernels
/Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'?
^~~~~~~~~~~~~~~~~~
"ggml-common.h"
1 error generated.
```
Removing the relative path then allowed the build to complete
successfully.