* Remove make dependency
* Added option to specify Ninja generator
* use ninja-build as default for several CI
* Revert "use ninja-build as default for several CI"
This reverts commit f552c4559b.
* changed use plain string rather than arrays
* Enabled ninja build by default for experimentation
* ci: add run.sh to test conditions to trigger GitHub CI and self-hosted runners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Enabled ninja build by default on self-hosted envs for experimentation
* ci: revert generator to ninja instead of ninja multi-config
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: install ninja-build for self-hosted workflows
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: revert ninja from self-hosted runners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: missed one self-hosted step
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ci: fix windows ci errors from an errenous revert
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Added explicit build types for Ninja
Also reverted some needless change
* ci: use ninja multi-config for vulkan-x64 build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* added time command to measure build time
* Keeping some configs to use Ninja which show improvement
* minor fix based on review
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* ci: rm `time` from custom containers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aaron Teo <taronaeo@gmail.com>
* common : add standard Hugging Face cache support
- Use HF API to find all files
- Migrate all manifests to hugging face cache at startup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check with the quant tag
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Improve error handling and report API errors
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Restore common_cached_model_info and align mmproj filtering
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Prefer main when getting cached ref
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use cached files when HF API fails
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Use final_path..
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Check all inputs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* hex-dma: make chained dma the default to handle newer models
This also includes some new instrumentation that we can remove later.
* hexagon: add uint32 dump helper
* hexagon: use single-page VTCM allocation to avoid issues with large gather ops in ssm-conv
ssm-conv uses HVX gather instruction and that instruction cannot handle cases where the base+offset
spans page boundaries.
* hexagon: update ssm-conv to make base-addr compute a bit easier to read
* hex-dma: use 1d mode for reshaping, it supports sizes up to 24-bits (>16MB)
* hex-bin: fix incorrect stride logic
* hexagon: make sure repack buffs are dumped for verbose > 2
* hex-bin: consistently use dma_queue_push even for dummy dst transactions
* hex-dma: start using 2d-wide mode on v75 and up
The removes the need to deal with the 16-bit limitaion for the strides.
* hex-bin: cleanup kernel selection logic
* hex-bin: cleanup binary op core and fix transposed tensor handling
* snapdragon: update run-bench to use larger ubatch and fa-on
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* metal:add conv_3d backend
Rebased with master and resolved conflicts.
* Resolved issues related to changes in variable names
* kernel void kernel_upscale_bilinear_f32 was missing in my branch, added back, should pass all tests now
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
ACL graph capture disallows host-to-device memcpy and device memory
malloc/free on the captured stream. Pre-load the RoPE cache before
capture so that:
- Host-to-device copies and allocations run on the non-captured stream
- Cache metadata is populated and memory pool is warmed up
- During capture, only on-device computations are recorded; host-side
and allocation branches are skipped
* fix(openvino): explicit memset in buffer_context allocation
* minor
---------
Co-authored-by: Dan Hoffman <dhoffman@cyket.net>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* added support for internvl's dynamic high-resolution (Qianfan-OCR needed)
* add min/max dynamic patch to gguf meta
* clean up
* simplified handling min/max dynamic patch
* reuse llava_uhd logic for slice images
* provide default values for older models
* flake8
* prevent writing 0 value to gguf
* remove duplicated resolution candidates with a better algorithm
* fix indentation
* format
* add protection from divide by zero
* change to 0 to be safe
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>