llama.cpp/ggml
nullname 2cd429ca75
feat: perf opt part5 (#52)
* rename

* Refactor vector operations in vec_op_impl and vec_dot_product_impl for improved clarity and performance

* wip

* Enhance vector copy functions for improved performance and clarity in vec_ops.hpp

* wip

* wip

* wip

* Optimize vector dot product implementations for enhanced performance and efficiency

* Enhance flash attention implementation and type traits for improved vector operations and alignment checks

# Conflicts:
#	ggml/src/ggml-qnn/npu/device/type_traits.cpp

* remove align

* wip

* Enhance vector dot product implementation for improved performance by adding parallel processing for multiple vector pairs

* Revert "Enhance vector dot product implementation for improved performance by adding parallel processing for multiple vector pairs"

This reverts commit 78cc24ed2285002ca29d6189fa61ba4ce24f8d16.

* Enhance flash attention implementation with type checks for tensor data types and improved constexpr usage

* wip

* opt mask calc

* Revert "opt mask calc"

This reverts commit bb1840876692a11511d5ab7828b8a707402e30b9.

* wip

* opt mul mat caching logic to add dst cache

* Revert "opt mul mat caching logic to add dst cache"

This reverts commit ab442fa9f763b3873c929936e4cb739cb1c83850.

* wip

* Refactor matrix multiplication implementation to include vector conversion and performance tracking

* wip

* wip

* wip

* create vec_ops.inl for more aggressive compiler inline

* wip

* refactor vector dot product implementations for improved readability and performance

* refactor vector conversion functions to use HVX_Vector_Dual for improved clarity and consistency

* wip

* wip

* wip

* implement row size caching logic and enhance type traits for F32 support

* refactor matrix multiplication functions to improve caching logic and simplify tensor alignment handling

* add vector zeroing functions for F32 and F16 types to optimize memory initialization

* Revert "add vector zeroing functions for F32 and F16 types to optimize memory initialization"

This reverts commit e374326dc74d049e6603e393ade418d9ef2b83f3.

* wip

* refactor alignment checks in dot product function to handle null pointers

* wip

* refactor load_block_generic and related functions for improved alignment handling

* wip

* refactor flash attention implementation and introduce type-erased dot function for improved type handling

* refactor dot product implementations for improved loop handling and clarity

* refactor thread_pool constructor to pre-allocate VTCM cache for each thread

* Revert "refactor thread_pool constructor to pre-allocate VTCM cache for each thread"

This reverts commit 00cdd3fa88d909feef44ddaa42095274b7627685.

* wip

* opt interfaces for tensor cleanup

* refactor mul_mat_impl to use aligned size for src0 row calculation

* refactor: update dequantized_row_size logic and add size alignment checks for tensors

* wip

* wip

* refactor: replace raw pointer initialization with invalid handle constants for better clarity

* wip
2025-07-23 00:38:09 +08:00
..
cmake ggml-cpu : rework weak alias on apple targets (#14146) 2025-06-16 13:54:15 +08:00
include Merge branch 'master' into dev-refactoring 2025-07-18 23:43:20 +08:00
src feat: perf opt part5 (#52) 2025-07-23 00:38:09 +08:00
.gitignore vulkan : cmake integration (#8119) 2024-07-13 18:12:39 +02:00
CMakeLists.txt Merge branch 'master' into dev-refactoring 2025-07-18 23:43:20 +08:00