* Use jinja chat template system prompt by default
* faster conditional order
* remove nested ternary
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* fix typos and improve menu text clarity
* rename variable trimedValue to trimmedValue
* add updated index.html.gz
* rebuild
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
* Upgrade init_tensor API to return a ggml_status
To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.
* misc fixes
---------
Co-authored-by: slaren <slarengh@gmail.com>
* Added Phi-4-mini-instruct support
* Update regex per ngxson
* Change the vocab base to Xenova/gpt-4o
* fix conversion update script
* no need to check longrope
* minor style fix
* fix python style
---------
Co-authored-by: Nicholas Sparks <nisparks@microsoft.com>
* debug
* disable reshape
* make sure single node op have same type
* fix warning at the logger
* Revert "disable reshape"
This reverts commit 5aeca4ba9bec6db3f047f9da803df20f9f6612b3.
* vulkan: implement specialized MMV kernels for IQ2 quantizations
* vulkan: add MMV kernels for IQ3 quants
* vulkan: Increase MMV batch size and unroll IQ LUT setup
* vulkan: fix init_iq_shmem for WG sizes larger than tables
* vulkan: common batch size for all I-quants
* Added SVE Support for Q2_K Quantized Models
* Use 4-space indentation in the switch cases
* removed comments lines
* Remove the loop Retain the curly bracess for better understanding of code
* Remove the comment like added for q3_k_q8_k kernel
---------
Co-authored-by: vithulep <p.m.vithule1517@gmail.com>
* fix warning
* wip
* add todo for graph key generate
* rename some file to meet upstream guideline
* remove local .clang-format
* expend supported/unsupported counter to all ops
* append device name to log
* port to ggml logger
* fix warning after adapt to ggml logger
* append \n to all log
* use case op instead of convert
* Revert "use case op instead of convert"
This reverts commit e662fc2dfee41719aaf7bc9d75e03e8d0f7ded0f.
* fix op that needs same shape
* opt kQnnOpsTable
* refresh params name field when getting op config
* opt npu log print
* remove unused functions
* Fix dependencies between ggml and backends
ggml backends link only to ggml-base and ggml links to all backends.
* Fix installation of ggml backends
Set up GNUInstallDirs before setting the installation directory of ggml backends
* Refactor gguf scripts to improve metadata handling
Added contents method to ReaderField class
Added endianess property to GGUFReader class
* update scripts
* fix import
* remove unused import
* attempt to work around flake and pyright errors
* second attempt
* give up, ignore type
* bump version
* apply newbyteorder fixes
Currently self.byte_order is never used.
Actually use it to byteswap read data to
allow reading big endian files on little endian systems
and vice versa.
Now it's possible to convert little-endian model
into a big-endian model and back
on a little-endian system.
It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).
* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
* move qnn_instance function implementation into cpp
* wip
* wip
* move dl related function into separated file
* use cast op for gpu
* Revert "use cast op for gpu"
This reverts commit 05df7362a15c022d05940d682e84cf480a082c6a.
* Reapply "use cast op for gpu"
This reverts commit 2520e5922a216faceb6d7efcde23dafe6947a4b3.
* fix compiling error in win
* fix align_alloc in win
* fix compiling error
* add get sys free/total mem for win
* wip
* suppress warning in win
* add missing chrono header
* set the correct qnn lib name for windows
* add flag to control cpu backend
* wip
* wip
* Revert "Reapply "use cast op for gpu""
This reverts commit f56519c374a7d46faac706cf214de48ff5fc5139.
* fix compiling error for linux build
* fix cdsprpc dynamic library name
* wip
* skip rpc load fail
* fix page_align_alloc
* suppress some warning in gcc
* wip
* reuse align to function
* more log
* add log and fix warning
* wip
* fix asan errors and memory leaks
* fix the get_io_tensors_from_graph
* improve comment
* print GGML_QNN_DEFAULT_LIB_SEARCH_PATH
* revert some unused changes
* move library search path setter into qnn module
* fix android library loading
* skip qnn_device_get_platform_info for npu emulator
Use consolidated open function call from File class. Change
read_all to to_string(). Remove exclusive locking, the intent for
that lock is to avoid multiple processes writing to the same file,
it's not an issue for readers, although we may want to consider
adding a shared lock. Remove passing nullptr as reference,
references are never supposed to be null. clang-format the code
for consistent styling.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
* llava: export function `clip_build_img_from_pixels` to build image from pixels decoded by other libraries instead of stb_image.h for better performance
* Apply suggestions from code review
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>