This patch updates the example in docs/development/HOWTO-add-model.md to
reflect recent changes after `TextModel` and `MmprojModel` were introduced.
It replaces the outdated `Model` base class with `TextModel` or `MmprojModel`
and updates the registration example accordingly.
Signed-off-by: Wook Song <wook16.song@samsung.com>
Neither "g" nor "x" are valid portPos specifiers per the official
[graphviz documents](https://graphviz.org/docs/attr-types/portPos/):
> If a compass point is used, it must have the form "n","ne","e","se","s","sw","w","nw","c","_".
I tested locally for it to fall back to default portPos specifier if an
invalid portPos is specified. As a consequence, we can remove associated
code.
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* CMake config: Create target only once
Fix error on repeated find_package(ggml).
For simplicity, check only for the top-level ggml::ggml.
* CMake config: Add CUDA link libs
* CMake config: Add OpenCL link libs
* CMake config: Use canonical find_dependency
Use set and append to control link lib variables.
Apply more $<LINK_ONLY...>.
* CMake config: Wire OpenMP dependency
This commit removes the inclusion of `<cstdlib>`.
The motivation for this change is that this source file does not seem to
use any functions from this header and the comment about `qsort` is a
little misleading/confusing.
MiniCPM models use the llm_build_granite constructor which was changed
in the Granite Four PR to use hparams.rope_finetuned instead of a
use_rope parameter. MiniCPM models need rope enabled by default.
Fixes inference from gibberish to correct responses.
* weight format to nz for 310p
* remove quant weight format to nz
* clean code
* fix
* make the conditions for converting weights to NZ format consistent
* clean code
* rename
* Refactor vector operations in vec_op_impl and vec_dot_product_impl for improved clarity and performance
* wip
* Enhance vector copy functions for improved performance and clarity in vec_ops.hpp
* wip
* wip
* wip
* Optimize vector dot product implementations for enhanced performance and efficiency
* Enhance flash attention implementation and type traits for improved vector operations and alignment checks
# Conflicts:
# ggml/src/ggml-qnn/npu/device/type_traits.cpp
* remove align
* wip
* Enhance vector dot product implementation for improved performance by adding parallel processing for multiple vector pairs
* Revert "Enhance vector dot product implementation for improved performance by adding parallel processing for multiple vector pairs"
This reverts commit 78cc24ed2285002ca29d6189fa61ba4ce24f8d16.
* Enhance flash attention implementation with type checks for tensor data types and improved constexpr usage
* wip
* opt mask calc
* Revert "opt mask calc"
This reverts commit bb1840876692a11511d5ab7828b8a707402e30b9.
* wip
* opt mul mat caching logic to add dst cache
* Revert "opt mul mat caching logic to add dst cache"
This reverts commit ab442fa9f763b3873c929936e4cb739cb1c83850.
* wip
* Refactor matrix multiplication implementation to include vector conversion and performance tracking
* wip
* wip
* wip
* create vec_ops.inl for more aggressive compiler inline
* wip
* refactor vector dot product implementations for improved readability and performance
* refactor vector conversion functions to use HVX_Vector_Dual for improved clarity and consistency
* wip
* wip
* wip
* implement row size caching logic and enhance type traits for F32 support
* refactor matrix multiplication functions to improve caching logic and simplify tensor alignment handling
* add vector zeroing functions for F32 and F16 types to optimize memory initialization
* Revert "add vector zeroing functions for F32 and F16 types to optimize memory initialization"
This reverts commit e374326dc74d049e6603e393ade418d9ef2b83f3.
* wip
* refactor alignment checks in dot product function to handle null pointers
* wip
* refactor load_block_generic and related functions for improved alignment handling
* wip
* refactor flash attention implementation and introduce type-erased dot function for improved type handling
* refactor dot product implementations for improved loop handling and clarity
* refactor thread_pool constructor to pre-allocate VTCM cache for each thread
* Revert "refactor thread_pool constructor to pre-allocate VTCM cache for each thread"
This reverts commit 00cdd3fa88d909feef44ddaa42095274b7627685.
* wip
* opt interfaces for tensor cleanup
* refactor mul_mat_impl to use aligned size for src0 row calculation
* refactor: update dequantized_row_size logic and add size alignment checks for tensors
* wip
* wip
* refactor: replace raw pointer initialization with invalid handle constants for better clarity
* wip
* Mtmd: add a way to select device for vision encoder
* simplify
* format
* Warn user if manual device selection failed
* initialize backend to nullptr
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.
* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md