llama.cpp/ggml
nullname a2df09b6af
[WIP] feat: perf opt (#10)
* reduce log

* wip

* add function to create concat nodes

* opt

* insert concat node before mulmat

* use resize op

* wip

* add bind_buffer and remov ggml prefix in tensor types

* use gather node instead

* fix tensor type, now succeed in gpu and cpu, failed in npu

* add comment

* wip

* add comment

* wip

* in destructor, clear internal buffer before unbind

* disable gather for npu

* wip

* count swap memory as free memory

* wip

* fix supported_types

ggml_backend_device_i.supports_op will be invoked before ggml_backend_device_i.init_backend

* rename create_tensors -> initialize_op_nodes

* move ggml_qnn_op_config to deparated file

* wip

* add create_convert_nodes

* add comment

* enable different type in/out for npu and cpu backend

* fix npu convert op

* enlarge max buffer size

* add more error code

* check tensor type before create convert node

* add log

* add log

* remove transpose0 and use buildin transpose flag

* rename transpose1 -> transpose_out

* disable convert for npu

* add more logs
2024-11-29 00:03:23 +08:00
..
cmake llama : reorganize source code + improve CMake (#8006) 2024-06-26 18:33:02 +03:00
include Merge branch 'master' into dev-refactoring 2024-11-13 17:10:20 +08:00
src [WIP] feat: perf opt (#10) 2024-11-29 00:03:23 +08:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt Merge branch 'master' into dev-refactoring 2024-11-13 17:10:20 +08:00