alex-spacemit
b77e6c18e1
ggml: riscv: add riscv spacemit backend ( #15288 )
...
* ggml: add spacemit backend
Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23
* add new line at end of file
Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2
* add riscv zba extension limit
Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce
* fixed for review comments, file renamed and format
Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce
* fixed for code format, after clang-format
Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2
* use _Float16 instead of __fp16
Change-Id: I039fb02bb95270e641bc4442204e658735859d43
* add ci for riscv64-spacemit-ime-native
Change-Id: I711c1033061df1a289ea77891b2997599dfe8279
* update debian-13-riscv64-spacemit-ime-native ci label
Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a
* remove license comment for spacemit ime
Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3
* upgrade binutils for gcc ime
Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45
* add spacemit ime cross jobs
Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6
* remove native compile for riscv64-spacemit-ime
Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e
* ci : add caching for spacemit ime cross toolchain
Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de
* ci: bug fixed for cache path and env
Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a
* Update .github/workflows/build-linux-cross.yml for cache path
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* bugfixed for build-linux-cross.yml, syntax error
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: cailinxi <linxi.cai@spacemit.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-09-29 17:50:44 +03:00
R0CKSTAR
a86a580a66
musa: upgrade musa sdk to 4.3.0 ( #16240 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-09-26 02:56:38 +02:00
Aaron Teo
264f1b5187
zdnn: refactor codebase + add docs ( #16178 )
...
* zdnn: initial matmul refactor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rm static from funcs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update ggml-zdnn.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: change header files to hpp
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch to common.hpp
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move mulmat forward around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rm inline from utils
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add zDNN docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-09-23 14:53:05 +08:00
Aaron Teo
40be51152d
ggml-zdnn: fix #15414 , activate FP16 and BF16 acceleration and incorrect zTensor free ( #15839 )
2025-09-13 02:39:52 +08:00
hipudding
c0389dba43
CANN: Disable acl_graph for prefill stage ( #15933 )
...
Since the prefill length is not fixed, graphs constructed for the
prefill stage cannot be reused. For this reason, ACL graph
execution is disabled by default during prefill.
2025-09-11 15:59:37 +08:00
Chenguang Li
28b5f190ef
CANN: implement LRU cache for ACL graphs ( #15814 )
...
* CANN: implement LRU cache for ACL graphs in CANN backend
- Introduce ggml_cann_graph_lru_cache to store multiple ggml_cann_graph objects.
- Graphs are loaded on demand and evicted using LRU policy when capacity is exceeded.
- Updated push, move_to_front, and clear methods to manage cached graphs efficiently.
- Ensures reuse of graphs, reducing graph reconstruction overhead in CANN backend.
* fix typo
* The LRU cache capacity can be configured via an env variable
Signed-off-by: noemotiovon <757486878@qq.com>
* refactory acl graph
* refactory && fix review comments
Signed-off-by: noemotiovon <757486878@qq.com>
---------
Signed-off-by: noemotiovon <757486878@qq.com>
2025-09-10 15:29:12 +08:00
Aaron Teo
186415d595
ggml-cpu: drop support for nnpa intrinsics ( #15821 )
2025-09-06 11:27:28 +08:00
hipudding
5421f63ab0
CANN: Fix precision issue on 310I DUO multi-devices ( #15784 )
2025-09-04 15:12:30 +08:00
Chenguang Li
2f853687b3
CANN: Support eager execution mode under ACL graph compilation ( #15712 )
...
* [CANN] Support eager execution mode under ACL graph compilation
Add support for running operators in eager mode while ACL graph
compilation is enabled. This allows bypassing graph execution
and directly submitting ops, which is useful for debugging and
reducing graph build overhead in certain scenarios.
Signed-off-by: noemotiovon <757486878@qq.com>
* fix typo
Signed-off-by: noemotiovon <757486878@qq.com>
* rename to acl_graph_mode
Signed-off-by: noemotiovon <757486878@qq.com>
---------
Signed-off-by: noemotiovon <757486878@qq.com>
2025-09-02 14:07:48 +08:00
Diego Devesa
dd892555b0
Update build.md to remove MSVC arm64 notes ( #15684 )
...
Removed information about MSVC compiler limitations for arm64 builds.
2025-08-30 23:51:28 +08:00
ExtReMLapin
792b44f2ed
server : add documentation for `parallel_tool_calls` param ( #15647 )
...
Co-authored-by: Pierre F <no@p.e>
2025-08-29 20:25:40 +03:00
tc-mb
c4e9239064
model : support MiniCPM-V 4.5 ( #15575 )
2025-08-26 10:05:55 +02:00
Aaron Teo
ad5c975c2d
ggml-cpu: Support Q5_0 and Q5_1 on s390x ( #15486 )
...
* ggml-cpu: initial q5_0 impl for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: updated q5_0 code for better performance
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: use optimised hsum for better performance
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: introduce q5_1 simd + refactor q5_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix incorrect return type vec_hsum
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: q5_0 incomplete refactor + table_b2b_0 activation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: refactor q5_1
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: q5_1 update loop unroll to 4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update q5_0 unroll to 4
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update build-s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update unused variables q5_0
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update the last update date
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-22 16:11:04 +08:00
Johannes Gäßler
7a6e91ad26
CUDA: replace GGML_CUDA_F16 with CUDA arch checks ( #15433 )
2025-08-20 16:58:49 +02:00
Aaron Teo
ff27f80a74
ggml: initial IBM zDNN backend ( #14975 )
...
* ggml-zdnn: inital backend impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: temp change z17 to arch15
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: fix build bugs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: tensor->extra logging check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: add layout name mapping, ztensor information
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: separate logging into its own line
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: add shape comparison
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: add ggml_tensor shape log
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
ggml-zdnn: fix incorrect shape logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add output buffer check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: run compute and store into tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add more loggers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update set_tensor logging to check only for matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: last working matmul version
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add comments to prevent accidentally deleting lines
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: support op out_prod
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update op out_prod to use tensor->extra
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rewrite the backend implementation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bugfix new impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix compiler warnings and bugfixes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: test ztensor finding in init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: implement at least 1 op to test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: assign tensor->extra to buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add check for view tensors to prevent init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rework init_tensor to create new buffers
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch to std vector instead of array
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch buffers back and set to arbitrary number
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: impl init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update supports_op matmul matrix
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix incorrect ztensor shape, reduce memory padding
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: impl matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix compiler error missing type
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing data transform call
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add bias init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: tighten memory usage, change string allocation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add bias ztensor and data free
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add bias data transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add more debug info for extra buffer transform
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add logger to check if mat mul ops go through set_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: activate bias transform in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move weights transform into mulmat
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add more safeguards in matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix sequencing of transforms
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bugfix transform ztensor vs origtensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: figure out why sigtrap is happening
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move everything back to local declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: move bias data to local also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bring back working matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: rewrite into mre
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing vector import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing vector import in header
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt to fix sigsegv
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix invalid ztensor buffer release
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add logging to debug free buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: remove free_buffer debug info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add parmblkformat detections
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add nnpa installed detection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add zdnn_init call for static libs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at fixing invalid buffer
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: switch to using deque to fix pointer deref problem
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add weights logging to check
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt to use unique ptr
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add tensor to pre_tfm_desc logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add inputs logging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable op_none initialisation for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix missing return from init_tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: load ztensors in cgraph exec
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: work on moving output ztensor as well
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable logging and breakpoints for full test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at manually changing the layout
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at using default nwhc format instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable global load ztensor for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix errorenous output load tensor
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: add guards to prevent loading ztensor if transformed
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: bring load ztensor back to init routine
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: code clean up
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix ztensor deallocation abort
stabilise ggml <-> zdnn api
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: clean up matmul selection
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: clean up project structure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: update documentation, prepare for upstream
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* chore: add codeowners
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: disable batched matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: attempt at fixing tensor views during matmul
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: deny all view tensors directly
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix pr comments
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update ops docs for zdnn
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: redo test-backend-ops for ops.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-zdnn: fix typo in build-s390x.md
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* codeowners: remove taronaeo for now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "codeowners: remove taronaeo for now"
This reverts commit 411ea4ed78 .
* ggml-zdnn: remove unused ggml_zdnn macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-08-15 21:11:22 +08:00
rainred
cf9e5648a7
mtmd : Fix MinicpmV model converter and clip to avoid using hardcode. ( #14750 )
...
* Fix MinicpmV model converter and clip to avoid using hardcode.
* Code update for pr/14750
* Remove unused field, update script path in docs.
* Add version 5 for fallback code.
---------
Co-authored-by: lzhang <zhanglei@modelbest.cn>
2025-08-11 16:12:12 +02:00
tc-mb
952a47f455
mtmd : support MiniCPM-V 4.0 ( #14983 )
...
* support minicpm-v 4
* add md
* support MiniCPM-o 4.0
* add default location
* temp rm MiniCPM-o 4.0
* fix code
* fix "minicpmv_projector" default path
2025-07-31 17:22:17 +02:00
hipudding
11490b3672
CANN: Improve loading efficiency after converting weights to NZ format. ( #14985 )
...
* CANN: Improve loading efficiency after converting weights to NZ format.
* CANN: fix typo
2025-07-31 19:47:20 +08:00
Xinpeng Dou
61550f8231
CANN: update ops docs ( #14935 )
...
* CANN:add ops docs
* CANN: update ops docs
2025-07-30 08:39:24 +08:00
lhez
8ad7b3e65b
opencl : add ops docs ( #14910 )
2025-07-28 18:50:17 +02:00
Xuan-Son Nguyen
00fa15fedc
mtmd : add support for Voxtral ( #14862 )
...
* mtmd : add support for Voxtral
* clean up
* fix python requirements
* add [BEGIN_AUDIO] token
* also support Devstral conversion
* add docs and tests
* fix regression for ultravox
* minor coding style improvement
* correct project activation fn
* Apply suggestions from code review
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
---------
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
2025-07-28 15:01:48 +02:00
Georgi Gerganov
a5771c9eea
ops : update BLAS ( #14914 )
2025-07-28 10:01:03 +02:00
Georgi Gerganov
c35f9eaf09
ops : update Metal ( #14912 )
2025-07-28 08:22:56 +03:00
Ruben Ortlam
bf78f5439e
vulkan: add ops docs ( #14900 )
2025-07-27 15:33:08 +02:00
Akarshan Biswas
bbfc849274
SYCL: add ops doc ( #14901 )
2025-07-27 17:52:58 +05:30
Aman Gupta
446595b9b3
Docs: add instructions for adding backends ( #14889 )
2025-07-27 09:36:43 +08:00
Aaron Teo
c7f3169cd5
ggml-cpu : disable GGML_NNPA by default due to instability ( #14880 )
...
* docs: update s390x document for sentencepiece
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit e086c5e3a7 )
* docs: update huggingface links + reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8410b085ea )
* ggml-cpu: disable ggml-nnpa compile flag by default
fixes #14877
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 412f4c7c88 )
* docs: update s390x build docs to reflect nnpa disable
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit c1eeae1d0c )
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-07-25 19:09:03 +02:00
wooksong
e7fecba934
docs : update HOWTO‑add‑model.md for ModelBase and new model classes ( #14874 )
...
This patch updates the example in docs/development/HOWTO-add-model.md to
reflect recent changes after `TextModel` and `MmprojModel` were introduced.
It replaces the outdated `Model` base class with `TextModel` or `MmprojModel`
and updates the registration example accordingly.
Signed-off-by: Wook Song <wook16.song@samsung.com>
2025-07-25 16:25:05 +02:00
R0CKSTAR
3f4fc97f1d
musa: upgrade musa sdk to rc4.2.0 ( #14498 )
...
* musa: apply mublas API changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: update musa version to 4.2.0
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: restore MUSA graph settings in CMakeLists.txt
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: disable mudnnMemcpyAsync by default
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: switch back to non-mudnn images
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* minor changes
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: restore rc in docker image tag
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-07-24 20:05:37 +01:00
Pouya
39cffdf188
docs: add libcurl-dev install hint for Linux distros ( #14801 )
...
* docs: add libcurl-dev install hint for Linux distros
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com>
* Update docs/build.md
---------
Signed-off-by: PouyaGhahramanian <PooyaGhahramanian@gmail.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-07-24 11:26:44 +02:00
rspOverflow
b526ad2668
Documentation: Further revisions to the Vulkan section in build.md ( #14785 )
...
* Documentation: Revised and further improved the Vulkan instructions for Linux users in build.md.
* Minor: Revise step 2 of the Vulkan instructions for Linux users in build.md
2025-07-20 18:55:32 +02:00
rspOverflow
f0d4d176df
Documentation: Update build.md's Vulkan section ( #14736 )
...
* Documentation: Rewrote and updated the "Without docker" portion of the Vulkan backend build documentation.
* Documentation: Reorganize build.md's Vulkan section.
2025-07-19 12:18:36 +02:00
Reese Levine
21c021745d
ggml: Add initial WebGPU backend ( #14521 )
...
* Minimal setup of webgpu backend with dawn. Just prints out the adapter and segfaults
* Initialize webgpu device
* Making progress on setting up the backend
* Finish more boilerplate/utility functions
* Organize file and work on alloc buffer
* Add webgpu_context to prepare for actually running some shaders
* Work on memset and add shader loading
* Work on memset polyfill
* Implement set_tensor as webgpu WriteBuffer, remove host_buffer stubs since webgpu doesn't support it
* Implement get_tensor and buffer_clear
* Finish rest of setup
* Start work on compute graph
* Basic mat mul working
* Work on emscripten build
* Basic WebGPU backend instructions
* Use EMSCRIPTEN flag
* Work on passing ci, implement 4d tensor multiplication
* Pass thread safety test
* Implement permuting for mul_mat and cpy
* minor cleanups
* Address feedback
* Remove division by type size in cpy op
* Fix formatting and add github action workflows for vulkan and metal (m-series) webgpu backends
* Fix name
* Fix macos dawn prefix path
2025-07-16 18:18:51 +03:00
Aman Gupta
11ee0fea2a
Docs: script to auto-generate ggml operations docs ( #14598 )
...
* Docs: script to auto-generate ggml operations docs
* Review: formatting changes + change github action
* Use built-in types instead of typing
* docs : add BLAS and Metal ops
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-07-10 23:29:01 +08:00
Xuan-Son Nguyen
08382869a2
model : add SmolLM3 ( #14581 )
...
* Init - first pass.
* Model -> ModelBase.
* fix errors in conversion.
* Update the graph.
* up.
* up.
* wip
* cgraph ok
* rm redundant code
---------
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
2025-07-08 18:07:01 +02:00
Grzegorz Grasza
1b2aaf28ac
Add Vulkan images to docker.md ( #14472 )
...
Right now it's not easy to find those.
2025-07-01 15:44:11 +02:00
Aaron Teo
bf5bcd0b85
docs: update s390x documentation + add faq ( #14389 )
...
* docs: update s390x documentation + add faq
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add s390x z17 build q&a
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-26 12:41:41 +02:00
Aaron Teo
60ef23d6c1
ggml-cpu: enable IBM NNPA Vector Intrinsics ( #14317 )
...
* ggml-cpu: add nnpa compile flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 4a9f60c201 )
* ggml-cpu: add fp16->fp32 nnpa first
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 8d4a7987f9 )
* ggml-cpu: add fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 0ff0d65162 )
* ggml-cpu: better variable names
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 2f58bbcbb8 )
* docs: update s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 01b929491b )
* ggml-cpu: add debugging prints to see if dlf16 is correct
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix print vs printf
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix float placeholder
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: ensure fp16 and fp32 load and stores are called
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fp16 load ensured to hit
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove sigint from fp16 store
for some reason, the function is not getting a hit when debugged with
gdb. we will need to investigate further
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp16_to_fp32
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa activate ggml_cpu_fp16_to_fp32 for 8 elements
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: nnpa switch to vec_xst test
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to vec_xst for 4 element loops also
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rework noop
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove noop, general code cleanup
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify variable naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa for ggml_cpu_fp32_to_fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add breakpoint for debugging
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test fix for conversion failure
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: disable fp32->fp16 nnpa conversions for now
there are some conversion failures in nnpa that requires the eyes of an
ibm stsm. will create a separate pr to introduce the fp32->fp16 change.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to elif macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix typo
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: reattempt fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix compiler types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: change to typedef vector types
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add 4 element loops for fp32->fp16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarified vector naming
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back fp32->fp16 store nnpa
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: activate nnpa fp32->fp16 or fp16->fp32 compute
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add nnpa macro check in ggml-impl
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add missing __func__
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: diagnose why __NNPA__ macro is not being defined
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: import vecintrin.h to fix compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: update macro tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 157f856c34 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to importing ggml-cpu-impl instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix macro declaration
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: test more macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add debug prints
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bruteforce macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move macro definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h to cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to private macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move s390x typedef to own header file
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 157f856c34 )
* ggml-cpu: move things around
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile macros
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: switch to quotes for import
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add compiler error macro
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add s390x detection in ggml-src
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: undo cmakelists work
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move s390x typedef to own header file"
This reverts commit 18d79e1a30 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedefs.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove typedef from cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add ggml-impl.h future notes
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: add todo comment for future reference
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: clarify naming of dlf16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove unnecessary target compile definitions
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa fp16->fp32 and fp32->fp16 to simd-mappings
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update broken huggingface link for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix duplicate func names during compile
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: fix duplicate func names during compile"
This reverts commit fbb733451f .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: refactor fp32->fp16 and fp16->fp32 simd to ggml-cpu"
This reverts commit bd288e8fa5 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: refactor fp16<->fp32 simd to ggml-cpu
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h import in quants.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix missing simd-mappings.h within repack
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix amx mmq missing simd-mappings.h
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: attempt at fixing loongarch failing build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move nnpa together with other fp16<->fp32 simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: fix wrong refactor of ggml-base
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164176555
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: remove dependency on ggml-cpu from ggml-base
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: rename all fp16<->fp32 macros to prefix with ggml_cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164449406
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: remove mistaken fallback macro
fallback logic was already implemented but i was too sleepy to realise
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: move ggml_table_f32_f16 back to ggml-base due to ci failures"
This reverts commit 32a3533564 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml: move ggml_table_f32_f16 to ggml-cpu"
This reverts commit 9e40d984ad .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml: move ggml_table_f32_f16 to ggml-cpu
ref: https://github.com/ggml-org/llama.cpp/pull/14317#discussion_r2164775006
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
(cherry picked from commit 9e40d984ad )
* ggml: move ggml_table_f32_f16 to ggml-cpu.c
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: extern c ggml_table_f32_f16 + chore docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h
we rely on the variable declaration in ggml-cpu.c instead
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: dedup ggml_table_f32_f16 from simd-mappings.h"
This reverts commit f71b21d2f7 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* ggml-cpu: bring back ggml_table_f32_f16
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "ggml-cpu: bring back ggml_table_f32_f16"
This reverts commit 2dce119178 .
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* fix ggml time initialization
* fix f32_f16 table init
* remove extra line
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: slaren <slarengh@gmail.com>
2025-06-25 23:49:04 +02:00
Anton Mitkov
2bf9d539dd
sycl: GGML_SYCL_DISABLE_OPT on by default for all Intel Devices ( #13973 )
2025-06-25 18:09:55 +02:00
David Chiu
d860dd99a4
docs : fix the link to llama.h ( #14293 )
2025-06-20 19:43:35 +02:00
Aaron Teo
8d94713654
docs: add s390x build documentation ( #14264 )
...
* docs: add s390x-specific build docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add s390x model conversion steps
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x build indent
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update hyperlinks for s390x docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: update llama.h docs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x add accelerator and perf optimizations
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x indent blocks
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: revert block indentation
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: add support information for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x reword
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: remove indentation for accelerator section s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: remove redundant words s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: reword for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: s390x reword simd
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* docs: fix trailing whitespace for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
2025-06-18 18:10:26 +01:00
Pepijn de Vos
00ba772610
docs : remove WIP since PR has been merged ( #13912 )
2025-06-15 08:06:37 +02:00
ddpasa
26ff3685bf
docs : Update multimodal.md ( #14122 )
...
* Update multimodal.md
* Update multimodal.md
2025-06-13 15:17:53 +02:00
Xinpeng Dou
e21d2d4ae2
CANN: Simplify the environment variable setting( #13104 )
...
* Simplify the environment variable setting to specify the memory pool type.
* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.
* update
* fix CI
* update
* delete whitespace
* fix according to review
* update CANN.md
* update CANN.md
2025-06-09 19:47:39 +08:00
Xuan-Son Nguyen
ea1431b0fa
docs : add "Quick start" section for new users ( #13862 )
...
* docs : add "Quick start" section for non-technical users
* rm flox
* Update README.md
2025-06-03 13:09:36 +02:00
Jiří Podivín
b3a89c3d9e
docs : Note about necessity of having libcurl installed for standard build. ( #13945 )
...
Signed-off-by: Jiri Podivin <jpodivin@gmail.com>
2025-05-31 18:58:35 +02:00
Xuan-Son Nguyen
bc583e3c63
mtmd : support Qwen 2.5 Omni (input audio+vision, no audio output) ( #13784 )
...
* mtmd : allow multiple modalities at the same time
* refactor mtmd tokenizer
* fix compile
* ok, missing SinusoidsPositionEmbedding
* first working version
* fix style
* more strict validate of n_embd
* refactor if..else to switch
* fix regression
* add test for 3B
* update docs
* fix tokenizing with add_special
* add more tests
* fix test case "huge"
* rm redundant code
* set_position_mrope_1d rm n_tokens
2025-05-27 14:06:10 +02:00
bandoti
72b090da2c
docs: remove link for llama-cli function calling ( #13810 )
2025-05-27 08:52:40 -03:00
Bizhao Shi
2d38b6e400
CANN: Add the basic supports of Flash Attention kernel ( #13627 )
...
* cann: add the basic FA support
* cann: update the readme
* cann: update the FlashAttention with PSEShift
* cann: update the input parameters in FA
* cann: update the alibi with max_bias
* cann: add the constrints of softcap
* cann: update the docs CANN.md
* cann: update the docs CANN.md
* cann: fix typo of CANN.md
* cann: add some comments and update the CANN.md
* cann: update the CANN.md
* cann: update the inner precise for fusedInferAttention
* cann: update the constraints of flash_attn_ext on ggml-cann.cpp
* cann: clean the whitespace
* cann: clean the whitespace
* cann: add a new endline
2025-05-26 10:20:18 +08:00
Xuan-Son Nguyen
40aaa8a403
mtmd : add support for Qwen2-Audio and SeaLLM-Audio ( #13760 )
...
* mtmd : add Qwen2-Audio support
* small clean up
* update discussion link
* clarify mtmd_get_output_embd
* clarification in multimodal.md
* fix ultravox bug
* ggml_cont
2025-05-25 14:06:32 +02:00
ddpasa
a08c1d2845
docs : add Moondream2 pre-quantized link ( #13745 )
...
* Multimodal: Added Moondream2 model and fixed ggml.org link
* Apply suggestions from code review
---------
Co-authored-by: name <none@none.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-25 14:04:49 +02:00
Olivier Chafik
f5cd27b71d
`server`: streaming of tool calls and thoughts when `--jinja` is on ( #12379 )
...
* add common_json w/ support for truncated json healing
* add common_chat_msg_diff
* partial common_chat_parse
* refactor parser w/ optionals
* server: wire chat diffs in stream mode
* fix trigger of thinking models (must happen after thoughts are closed)
* fix functionary v3.2 raw python!
* rename: common_chat_syntax (now contains format)
* rm common_regex.at_start
* don't return empty <think></think>
* accommodate yet another deepseek r1 distill fantasy syntax (`<|tool▁calls|>`)
* fix QwQ 32B tool call parsing after thoughts (hermes2)
* better logs for grammar triggers
* consume spaces after parse_json_tool_calls
* fix required tool calls w/ thinking models that have pre-opened thinking tags
* fix thinking model's initial trigger + test qwq's template
* run most test_tool_call tests in stream + non-stream modes
* make functionary v3.2 parsing more strict (differentiate first match from others)
* send final diff from server, to close off raw python arguments
* support partial content streaming in Generic mode
* tool-call: allow content prelude before hermes2 tool calls (for Qwen2.5)
* Update function-calling.md
* Update tool_bench.py
* chat-parser: remove input from exception (llm output may contain PII)
---------
Co-authored-by: ochafik <ochafik@google.com>
Co-authored-by: Olivier Chafik <ochafik@users.noreply.github.com>
2025-05-25 01:48:08 +01:00
Xuan-Son Nguyen
797990c4bc
mtmd : add ultravox audio input ( #13623 )
...
* convert ok, load ok
* warmup ok
* test
* still does not work?
* fix padding
* temporary give up
* fix merge conflict
* build_ultravox()
* rm test
* fix merge conflict
* add necessary mtmd APIs
* first working version (only 4s of audio)
* will this monster compile?
* fix compile
* please compile
* fPIC
* fix windows
* various fixes
* clean up audio_helpers
* fix conversion
* add some debug stuff
* long audio input ok
* adapt the api
* add --audio arg
* final touch UX
* add miniaudio to readme
* fix typo
* refactor kv metadata
* mtmd_default_marker()
2025-05-22 20:42:48 +02:00
R0CKSTAR
33983057d0
musa: Upgrade MUSA SDK version to rc4.0.1 and use mudnn::Unary::IDENTITY op to accelerate D2D memory copy ( #13647 )
...
* musa: fix build warning (unused parameter)
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: upgrade MUSA SDK version to rc4.0.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: use mudnn::Unary::IDENTITY op to accelerate D2D memory copy
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Update ggml/src/ggml-cuda/cpy.cu
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* musa: remove MUDNN_CHECK_GEN and use CUDA_CHECK_GEN instead in MUDNN_CHECK
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2025-05-21 09:58:49 +08:00
Xinpeng Dou
f0adb80bf7
CANN: Update CANN model support ( #13162 )
...
* Update CANN model support status
* Update of model support
* update
* update
* update
* fix format of CANN.md
* fix format of CANN.md
* fix format of CANN.md
2025-05-20 11:43:43 +08:00
Alberto Cabrera Pérez
725f23f1f3
sycl : backend documentation review ( #13544 )
...
* sycl: reviewing and updating docs
* Updates Runtime error codes
* Improves OOM troubleshooting entry
* Added a llama 3 sample
* Updated supported models
* Updated releases table
2025-05-19 14:38:20 +01:00
Xuan-Son Nguyen
92ecdcc06a
mtmd : add vision support for llama 4 ( #13282 )
...
* wip llama 4 conversion
* rm redundant __init__
* fix conversion
* fix conversion
* test impl
* try this
* reshape patch_embeddings_0
* fix view
* rm ffn_post_norm
* cgraph ok
* f32 for pos embd
* add image marker tokens
* Llama4UnfoldConvolution
* correct pixel shuffle
* fix merge conflicts
* correct
* add debug_graph
* logits matched, but it still preceives the image incorrectly
* fix style
* add image_grid_pinpoints
* handle llama 4 preprocessing
* rm load_image_size
* rm unused line
* fix
* small fix 2
* add test & docs
* fix llava-1.6 test
* test: add notion of huge models
* add comment
* add warn about degraded quality
2025-05-19 13:04:14 +02:00
Łukasz Ślusarczyk
9c404ed54c
sycl: use oneDNN for matrices multiplication ( #12972 )
2025-05-15 16:53:41 +02:00
ddpasa
21ca987fba
docs: Update link to ggml-org in multimodal.md ( #13513 )
...
* Update multimodal.md
Minor change to include the huggingface link
* Update docs/multimodal.md
---------
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
2025-05-14 09:59:12 +02:00
Thomas Germer
62d4250e52
docs : Fix typo in InternVL3 model name ( #13440 )
2025-05-10 22:26:46 +02:00
Xuan-Son Nguyen
3b24d26c22
server : update docs ( #13432 )
2025-05-10 18:44:49 +02:00
Xuan-Son Nguyen
053367d149
mtmd : support InternVL 2.5 and 3 ( #13422 )
...
* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps
2025-05-10 16:26:42 +02:00
Xuan-Son Nguyen
33eff40240
server : vision support via libmtmd ( #12898 )
...
* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2025-05-09 19:29:37 +02:00
Xuan-Son Nguyen
9b61acf060
mtmd : rename llava directory to mtmd ( #13311 )
...
* mv llava to mtmd
* change ref everywhere
2025-05-05 16:02:55 +02:00
Diego Devesa
1d36b3670b
llama : move end-user examples to tools directory ( #13249 )
...
* llama : move end-user examples to tools directory
---------
Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
2025-05-02 20:27:13 +02:00
Xuan-Son Nguyen
ecda2ec4b3
mtmd : Support Pixtral 12B ( #13065 )
...
* add pixtral text model (vision is wip)
* cgraph ok, just missing 2D RoPE
* fix bad rebase
* first working version
* fix problem with img_break token
* support dynamic image size
* update docs
* update test script
2025-04-23 20:21:59 +02:00
Xuan-Son Nguyen
243453533e
llava : update documentations ( #13055 )
...
* llava : update documentations
* fix typo
2025-04-22 10:37:00 +02:00
David Huang
84778e9770
CUDA/HIP: Share the same unified memory allocation logic. ( #12934 )
...
Replace compile-time `GGML_HIP_UMA` with environment variable `GGML_CUDA_ENABLE_UNIFIED_MEMORY`. This unifies the usage on NVIDIA and AMD GPUs, and allows a single binary to be shared between integrated and dedicated GPUs.
2025-04-15 11:20:38 +02:00
Romain Biessy
8ed71242f4
sycl: update documentation to use -no-cnv ( #12845 )
2025-04-09 11:22:04 +02:00
Nicolò Scipione
94148ba330
sycl: allow ggml-sycl configuration and compilation using Visual Studio project/solution ( #12625 )
2025-04-04 16:00:46 +02:00
lhez
7d7b1bafa7
opencl: update doc for OpenCL ( #12702 )
...
* opencl: add OpenCL to build.md
* opencl: remove fixed issue/TODO
* opencl: add link to OPENCL.md
* opencl: update doc - refine tools requirement for Windows 11 arm64
2025-04-03 22:18:17 -07:00
Atharva Dubey
2004644b7a
ci : add env variable in ggml-ci and document the same in SYCL.md ( #12736 )
2025-04-03 15:12:39 +03:00
Romain Biessy
8293970542
SYCL: Rename oneMKL to oneMath ( #12192 )
...
* Rename oneMKL Interface to oneMath
* Use oneMath for Intel vendor
* Rename occurences to mkl
* clang-format
* Silence verbose warnings
* Set oneMath HIP_TARGETS
* Fix silence warnings
* Remove step to build oneMath from build instructions
* Use fixed oneMath version
* Remove INTEL_CPU
* Fold CMake oneDNN conditions
* Use Intel oneMKL for Intel devices
* Improve CMake message
* Link against MKL::MKL_SYCL::BLAS only
* Move oneMath documentation to Nvidia and AMD sections
2025-04-01 16:24:29 +08:00
Slobodan Josic
bd40678df7
HIP: Add support for RDNA4 targets ( #12372 )
2025-03-26 23:46:30 +01:00
R0CKSTAR
fd7855f8f5
doc: [MUSA] minor changes ( #12583 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-26 09:09:48 +02:00
Dan Johansson
36ee06dd2d
docs : add build instructions for KleidiAI ( #12563 )
...
Signed-off-by: Dan Johansson <dan.johansson@arm.com>
2025-03-25 11:35:20 +02:00
Tei Home
3361e2deba
docs: update: improve the Fedoa CUDA guide ( #12536 )
...
* docs: update fedora-cuda guide
- Rename and place into Backend Folder.
- Update Host-Supplied Packages.
- Expand Recommended Users Section.
* docs: improve the flow of CUDA-FEDORA.md
2025-03-24 11:02:26 +00:00
Lars Sonchocky-Helldorf
18b663d8e4
install : add macports ( #12518 )
...
MacPorts section added
2025-03-23 10:21:48 +02:00
Svetlozar Georgiev
9ffcc9e374
sycl: cleanup oneDNN related code ( #12097 )
2025-03-21 10:15:56 +08:00
Łukasz Ślusarczyk
35cae5ba05
SYCL: using graphs is configurable by environment variable and compile option ( #12371 )
...
* alberto changes
* enable sycl graphs by env variable
* fixed compilation warnings in ggml-sycl.cpp
* renamed graph variables
* fix markdown in docs/backend/SYCL.md
Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
* fix markdown in docs/backend/SYCL.md again
* compiling graphs by default, renamed graph_enable to graph_disable
---------
Co-authored-by: Romain Biessy <romain.biessy@codeplay.com>
2025-03-18 11:16:31 +01:00
R0CKSTAR
251364549f
musa: support new arch mp_31 and update doc ( #12296 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-03-10 18:18:25 +01:00
David Huang
3ffbbd5ce1
HIP: rocWMMA documentation and enabling in workflow builds ( #12179 )
...
* Enable rocWMMA for Windows CI build
* Enable for Ubuntu
* GGML_HIP_ROCWMMA_FATTN documentation work
2025-03-06 14:14:11 +01:00
Olivier Chafik
42994048a3
update function-calling.md w/ template override for functionary-small-v3.2 ( #12214 )
2025-03-06 09:03:31 +00:00
Olivier Chafik
d7cfe1ffe0
docs: add docs/function-calling.md to lighten server/README.md's plight ( #12069 )
2025-02-25 18:52:56 +00:00
Neo Zhang Jianyu
08d5986290
[SYCL] Optimize mul_mat for Q4_0 on Intel GPU ( #12035 )
...
* opt performance by reorder for Intel GPU
* detect hw type and save opt feature, and print opt feature
* correct name
* support optimize graph once when compute graph, record the opt status in tensor->extra, make CI passed
* add env variable GGML_SYCL_DISABLE_OPT for debug
* use syclex::architecture replace the custom hw define, update the guide for GGML_SYCL_DISABLE_OPT
* add performance data
* mv getrows functions to separeted files
* fix global variables
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2025-02-24 22:33:23 +08:00
Bodhi
0b3863ff95
MUSA: support ARM64 and enable dp4a .etc ( #11843 )
...
* MUSA: support ARM64 and enable __dp4a .etc
* fix cross entropy loss op for musa
* update
* add cc info log for musa
* add comment for the MUSA .cc calculation block
---------
Co-authored-by: Bodhi Hu <huaishun.hu@mthreads.com>
2025-02-21 09:46:23 +02:00
Georgi Gerganov
68ff663a04
repo : update links to new url ( #11886 )
...
* repo : update links to new url
ggml-ci
* cont : more urls
ggml-ci
2025-02-15 16:40:57 +02:00
Michał Moskal
89daa2564f
llguidance build fixes for Windows ( #11664 )
...
* setup windows linking for llguidance; thanks @phil-scott-78
* add build instructions for windows and update script link
* change VS Community link from DE to EN
* whitespace fix
2025-02-14 12:46:08 -08:00
Georgi Gerganov
dbc2ec59b5
docker : drop to CUDA 12.4 ( #11869 )
...
* docker : drop to CUDA 12.4
* docker : update readme [no ci]
2025-02-14 14:48:40 +02:00
R0CKSTAR
bd6e55bfd3
musa: bump MUSA SDK version to rc3.1.1 ( #11822 )
...
* musa: Update MUSA SDK version to rc3.1.1
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* musa: Remove workaround in PR #10042
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2025-02-13 13:28:18 +01:00
lhez
4078c77f98
docs: add OpenCL ( #11697 )
2025-02-11 15:04:13 -07:00
Tei Home
9ab42dc722
docs: update fedora cuda guide for 12.8 release ( #11393 )
...
* docs: update fedora cuda guide for 12.8 release
* docs: build cuda update
2025-02-06 12:16:15 +00:00
Michał Moskal
ff227703d6
sampling : support for llguidance grammars ( #10224 )
...
* initial porting of previous LLG patch
* update for new APIs
* build: integrate llguidance as an external project
* use '%llguidance' as marker to enable llg lark syntax
* add some docs
* clarify docs
* code style fixes
* remove llguidance.h from .gitignore
* fix tests when llg is enabled
* pass vocab not model to llama_sampler_init_llg()
* copy test-grammar-integration.cpp to test-llguidance.cpp
* clang fmt
* fix ref-count bug
* build and run test
* gbnf -> lark syntax
* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag
* rename llguidance test file to test-grammar-llguidance.cpp
* add gh action for llg test
* align tests with LLG grammar syntax and JSON Schema spec
* llama_tokenizer() in fact requires valid utf8
* update llg
* format file
* add $LLGUIDANCE_LOG_LEVEL support
* fix whitespace
* fix warning
* include <cmath> for INFINITY
* add final newline
* fail llama_sampler_init_llg() at runtime
* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes
* simplify #includes
* improve doc string for LLAMA_LLGUIDANCE
* typo in merge
* bump llguidance to 0.6.12
2025-02-02 09:55:32 +02:00
Jafar Uruç
a07c2c8a52
docs : Update readme to build targets for local docker build ( #11368 )
2025-01-24 14:30:13 +01:00
Tei Home
1204f97270
doc: add cuda guide for fedora ( #11135 )
...
Since NVIDIA does not release CUDA for in-maintenance versions of Fedora, the process of setting up the CUDA toolkit on Fedora has become quite involved. This guide should help mere mortals install CUDA for development in a Fedora 39 toolbox environment, without affecting the host system.
2025-01-09 11:32:06 +00:00
Pierrick Hymbert
f8feb4b01a
model: Add support for PhiMoE arch ( #11003 )
...
* model: support phimoe
* python linter
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: minor
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
* doc: add phimoe as supported model
ggml-ci
---------
Co-authored-by: ThiloteE <73715071+ThiloteE@users.noreply.github.com>
2025-01-09 11:21:41 +01:00
Srihari-mcw
c37fb4cf62
Changes to CMakePresets.json to add ninja clang target on windows ( #10668 )
...
* Update cmakepreset.json to use clang with ninja by default
* Update cmakepreset.json to add clang and ninja based configs
* Updates to build.md file
* Make updates to rename preset targets
* Update with .cmake file
* Remove additional whitespaces
* Add .cmake file for x64-windows-llvm
* Update docs/build.md
* Update docs/build.md
---------
Co-authored-by: Max Krasnyansky <max.krasnyansky@gmail.com>
2024-12-09 09:40:19 -08:00
Djip007
19d8762ab6
ggml : refactor online repacking ( #10446 )
...
* rename ggml-cpu-aarch64.c to .cpp
* reformat extra cpu backend.
- clean Q4_0_N_M and IQ4_0_N_M
- remove from "file" tensor type
- allow only with dynamic repack
- extract cpu extra bufts and convert to C++
- hbm
- "aarch64"
- more generic use of extra buffer
- generalise extra_supports_op
- new API for "cpu-accel":
- amx
- aarch64
* clang-format
* Clean Q4_0_N_M ref
Enable restrict on C++
* add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack
* added/corrected control on tensor size for Q4 repacking.
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* add debug logs on repacks.
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-12-07 14:37:50 +02:00
Benson Wong
da6aac91f1
Add docs for creating a static build ( #10268 ) ( #10630 )
...
* Add notes for a static build
* Update docs/build.md
---------
Co-authored-by: Diego Devesa <slarengh@gmail.com>
2024-12-04 01:40:36 +01:00
Nikolaos Pothitos
82bca2257b
readme : add option, update default value, fix formatting ( #10271 )
...
* readme : document --no-display-prompt
* readme : update default prompt context size
* readme : remove unnecessary indentation
Indenting a line with four spaces makes Markdown treat that section as
plain text.
* readme : indent commands under bullets
* readme : indent commands in lettered list
2024-12-03 12:50:08 +02:00
Georgi Gerganov
8648c52101
make : deprecate ( #10514 )
...
* make : deprecate
ggml-ci
* ci : disable Makefile builds
ggml-ci
* docs : remove make references [no ci]
* ci : disable swift build
ggml-ci
* docs : remove obsolete make references, scripts, examples
ggml-ci
* basic fix for compare-commits.sh
* update build.md
* more build.md updates
* more build.md updates
* more build.md updates
* Update Makefile
Co-authored-by: Diego Devesa <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-12-02 21:22:53 +02:00
Random Fly
7281cf13ad
docs: fix outdated usage of llama-simple ( #10565 )
2024-11-28 16:03:11 +01:00
Ruixin Huang
c6bc73951e
CANN: Update cann.md to display correctly in CLion ( #10538 )
2024-11-28 15:27:11 +08:00
leo-pony
605fa66c50
CANN: Fix SOC_TYPE compile bug ( #10519 )
...
* CANN: Fix the bug build fail on Ascend310P under two cases:
1) Manual specify SOC_TYPE
2) Under some unusual compile environment
* Update the cann backend News content: Support F16 and F32 data type model for Ascend 310P NPU.
* fix CANN compile fail bug: the assert in ascend kernel function doesn't supportted on some CANN version
2024-11-28 15:25:24 +08:00
Tristan Druyen
be0e350c8b
Fix HIP flag inconsistency & build docs ( #10524 )
...
* Fix inconsistency of HIP flags in cmake & make
* Fix docs regarding GGML_HIP
2024-11-26 19:27:28 +01:00
Neo Zhang Jianyu
ad21c9e1f1
update rel to 4040 ( #10395 )
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-11-20 13:54:25 +08:00
Romain Biessy
2a1507c162
sycl : Add option to set the SYCL architecture for all targets ( #10266 )
...
* Add option to set the SYCL architecture for all targets
* Convert GGML_SYCL_HIP_TARGET to the more generic GGML_SYCL_ARCH option
* Document that setting GGML_SYCL_ARCH can improve the performance
2024-11-19 08:02:23 +00:00
Johannes Gäßler
c3ea58aca4
CUDA: remove DMMV, consolidate F16 mult mat vec ( #10318 )
2024-11-17 09:09:55 +01:00
FirstTimeEZ
0fff7fd798
docs : vulkan build instructions to use git bash mingw64 ( #10303 )
2024-11-17 00:29:18 +01:00
Chenguang Li
231f9360d9
cann: dockerfile and doc adjustment ( #10302 )
...
Co-authored-by: noemotiovon <noemotiovon@gmail.com>
2024-11-15 15:09:35 +08:00
Romain Biessy
5a54af4d4f
sycl: Use syclcompat::dp4a ( #10267 )
...
* sycl: Use syclcompat::dp4a
* Using the syclcompat version allow the compiler to optimize the
operation with native function
* Update news section
* Update CI Windows oneAPI version to 2025.0
* Reword doc
* Call syclcompat::dp4a inside dpct::dp4a
This reverts commit 90cb61d692 .
2024-11-15 11:09:12 +08:00
Diego Devesa
ae8de6d50a
ggml : build backends as libraries ( #10256 )
...
* ggml : build backends as libraries
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>
2024-11-14 18:04:35 +01:00
Zhiyuan Li
3bcd40b3c5
Optimize RWKV6 Operator Naming and Implement Multi-core CPU/ SYCL Acceleration ( #10133 )
...
* rwkv6: rename to wkv6
* rwkv6: support avx2 avx512 armv8 armv9
* rwkv6: update cuda file name
* rwkv6: rename params
* wkv on sycl
* sycl: add some ops
* sycl: Enhance OP support judgment
* wkv6: drop armv9 and tranfer to GGML style
ggml-ci
* sync : ggml
* update the function to use appropriate types
* fix define error
* Update ggml/src/ggml-cpu.c
* add appropriate asserts
* move element-wise functions outside
* put the declaration outside the loop
* rewrite to be more inline with the common pattern for distributing threads
* use recommended way GGML_TENSOR_LOCALS
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: Diego Devesa <slarengh@gmail.com>
Co-authored-by: Plamen Minev <pacominev@gmail.com>
Co-authored-by: Yuri Khrustalev <ykhrustalev@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <airdldl@163.com>
2024-11-07 15:19:10 +08:00
R0CKSTAR
943d20b411
musa : update doc ( #9856 )
...
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-12 08:09:53 +03:00
R0CKSTAR
cf8e0a3bb9
musa: add docker image support ( #9685 )
...
* mtgpu: add docker image support
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* mtgpu: enable docker workflow
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-10-10 20:10:37 +02:00
Andrew Minh Nguyen
f1af42fa8c
Update building for Android ( #9672 )
...
* docs : clarify building Android on Termux
* docs : update building Android on Termux
* docs : add cross-compiling for Android
* cmake : link dl explicitly for Android
2024-10-07 09:37:31 -07:00
Alberto Cabrera Pérez
f536f4c439
[SYCL] Initial cmake support of SYCL for AMD GPUs ( #9658 )
...
sycl: initial cmake support of SYCL for AMD GPUs
2024-10-02 13:57:18 +01:00
Neo Zhang Jianyu
faf67b3de4
[SYCL]set context default value to avoid memory issue, update guide ( #9476 )
...
* set context default to avoid memory issue, update guide
* Update docs/backend/SYCL.md
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
---------
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-09-18 08:30:31 +08:00
Dan Johansson
b2e89a3274
Arm AArch64: Documentation updates ( #9321 )
...
* Arm AArch64: Documentation updates
* Update docs/build.md to include information on how to enable the Arm optimized gemm/gemv kernels
* Update examples/quantize/README.md with information on the Q4_0_4_4, Q4_0_4_8 and Q4_0_8_8 formats
* Add newline to the end of docs/build.md
2024-09-09 10:02:45 +03:00
杨朱 · Kiki
c8671ae282
Fix broken links in docker.md ( #9306 )
2024-09-04 13:45:28 +02:00
蕭澧邦
cddae4884c
Correct typo run_llama2.sh > run-llama2.sh ( #9149 )
2024-08-30 22:10:01 +10:00
slaren
66b039a501
docker : update CUDA images ( #9213 )
2024-08-28 13:20:36 +02:00
luoyu-intel
1731d4238f
[SYCL] Add oneDNN primitive support ( #9091 )
...
* add onednn
* add sycl_f16
* add dnnl stream
* add engine map
* use dnnl for intel only
* use fp16fp16fp16
* update doc
2024-08-22 12:50:10 +08:00
wangshuai09
cfac111e2b
cann: add doc for cann backend ( #8867 )
...
Co-authored-by: xuedinge233 <damow890@gmail.com>
Co-authored-by: hipudding <huafengchun@gmail.com>
2024-08-19 16:46:38 +08:00
Neo Zhang
a21c6fd450
update guide ( #8909 )
...
Co-authored-by: Neo Zhang <>
2024-08-11 14:07:43 +05:30
matteo
afbb4c1322
ggml-cuda: Adding support for unified memory ( #8035 )
...
* Adding support for unified memory
* adding again the documentation about unified memory
* refactoring: Moved the unified memory code in the correct location.
* Fixed compilation error when using hipblas
* cleaning up the documentation
* Updating the documentation
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
* adding one more case where the PR should not be enabled
---------
Co-authored-by: matteo serva <matteo.serva@gmail.com>
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-08-01 23:28:28 +02:00
R0CKSTAR
e54c35e4fb
feat: Support Moore Threads GPU ( #8383 )
...
* Update doc for MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in Makefile
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Add GGML_MUSA in CMake
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* CUDA => MUSA
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* MUSA adds support for __vsubss4
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
* Fix CI build failure
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
---------
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
2024-07-28 01:41:25 +02:00
Andreas (Andi) Kunar
bf5a81df37
ggml : fix build on Windows with Snapdragon X ( #8531 )
...
* Improvements for Windows with Snapdragon X
* Revert "Improvements for Windows with Snapdragon X"
This reverts commit bf21397ae5 .
* Improvements for Windows with Snapdragon X
* WOA build clarifications
* WIndows on ARM build clarifications
* cmake build for Windows clarifications
* Update docs/build.md
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
---------
Co-authored-by: AndreasKunar <andreaskmsn.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-25 19:01:00 +03:00
Chen Xi
ed67bcb24f
[SYCL] fix multi-gpu issue on sycl ( #8554 )
...
---------
Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
2024-07-25 19:45:18 +08:00
NikolaiLyssogor
fc690b018e
docs: fix links in development docs [no ci] ( #8481 )
...
Fixes a few links to within the repo that were broken in the reorganization of the
documentation in #8325 .
2024-07-15 14:46:39 +03:00
bandoti
17eb6aa8a9
vulkan : cmake integration ( #8119 )
...
* Add Vulkan to CMake pkg
* Add Sycl to CMake pkg
* Add OpenMP to CMake pkg
* Split generated shader file into separate translation unit
* Add CMake target for Vulkan shaders
* Update README.md
* Add make target for Vulkan shaders
* Use pkg-config to locate vulkan library
* Add vulkan SDK dep to ubuntu-22-cmake-vulkan workflow
* Clean up tabs
* Move sudo to apt-key invocation
* Forward GGML_EXTRA_LIBS to CMake config pkg
* Update vulkan obj file paths
* Add shaderc to nix pkg
* Add python3 to Vulkan nix build
* Link against ggml in cmake pkg
* Remove Python dependency from Vulkan build
* code review changes
* Remove trailing newline
* Add cflags from pkg-config to fix w64devkit build
* Update README.md
* Remove trailing whitespace
* Update README.md
* Remove trailing whitespace
* Fix doc heading
* Make glslc required Vulkan component
* remove clblast from nix pkg
2024-07-13 18:12:39 +02:00
Dibakar Gope
0f1a39f343
ggml : add AArch64 optimized GEMV and GEMM Q4 kernels ( #5780 )
...
* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions
* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files
* Arm AArch64: minor code refactoring for rebase
* Arm AArch64: minor code refactoring for resolving a build issue with cmake
* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: minor code change for resolving a build issue with server-windows
* retrigger checks
* Arm AArch64: minor code changes for rebase
* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits
* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig
* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8
* Arm AArch64: minor code refactoring
* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat
* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat
* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types
* Arm AArch64: minor code refactoring
* Arm AArch64: minor code refactoring
* Arm AArch64: minor code refactoring
* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure
* Arm AArch64: remove a redundant comment
* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off
* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels
* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type
2024-07-10 15:14:51 +03:00
Xuan Son Nguyen
60d83a0149
update main readme ( #8333 )
2024-07-06 19:01:23 +02:00
Xuan Son Nguyen
be20e7f49d
Reorganize documentation pages ( #8325 )
...
* re-organize docs
* add link among docs
* add link to build docs
* fix style
* de-duplicate sections
2024-07-05 18:08:32 +02:00
Georgi Gerganov
e235b267a2
py : switch to snake_case ( #8305 )
...
* py : switch to snake_case
ggml-ci
* cont
ggml-ci
* cont
ggml-ci
* cont : fix link
* gguf-py : use snake_case in scripts entrypoint export
* py : rename requirements for convert_legacy_llama.py
Needed for scripts/check-requirements.sh
---------
Co-authored-by: Francis Couture-Harpin <git@compilade.net>
2024-07-05 07:53:33 +03:00
Georgi Gerganov
f3f65429c4
llama : reorganize source code + improve CMake ( #8006 )
...
* scripts : update sync [no ci]
* files : relocate [no ci]
* ci : disable kompute build [no ci]
* cmake : fixes [no ci]
* server : fix mingw build
ggml-ci
* cmake : minor [no ci]
* cmake : link math library [no ci]
* cmake : build normal ggml library (not object library) [no ci]
* cmake : fix kompute build
ggml-ci
* make,cmake : fix LLAMA_CUDA + replace GGML_CDEF_PRIVATE
ggml-ci
* move public backend headers to the public include directory (#8122 )
* move public backend headers to the public include directory
* nix test
* spm : fix metal header
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* scripts : fix sync paths [no ci]
* scripts : sync ggml-blas.h [no ci]
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-26 18:33:02 +03:00
Olivier Chafik
1c641e6aac
`build`: rename main → llama-cli, server → llama-server, llava-cli → llama-llava-cli, etc... ( #7809 )
...
* `main`/`server`: rename to `llama` / `llama-server` for consistency w/ homebrew
* server: update refs -> llama-server
gitignore llama-server
* server: simplify nix package
* main: update refs -> llama
fix examples/main ref
* main/server: fix targets
* update more names
* Update build.yml
* rm accidentally checked in bins
* update straggling refs
* Update .gitignore
* Update server-llm.sh
* main: target name -> llama-cli
* Prefix all example bins w/ llama-
* fix main refs
* rename {main->llama}-cmake-pkg binary
* prefix more cmake targets w/ llama-
* add/fix gbnf-validator subfolder to cmake
* sort cmake example subdirs
* rm bin files
* fix llama-lookup-* Makefile rules
* gitignore /llama-*
* rename Dockerfiles
* rename llama|main -> llama-cli; consistent RPM bin prefixes
* fix some missing -cli suffixes
* rename dockerfile w/ llama-cli
* rename(make): llama-baby-llama
* update dockerfile refs
* more llama-cli(.exe)
* fix test-eval-callback
* rename: llama-cli-cmake-pkg(.exe)
* address gbnf-validator unused fread warning (switched to C++ / ifstream)
* add two missing llama- prefixes
* Updating docs for eval-callback binary to use new `llama-` prefix.
* Updating a few lingering doc references for rename of main to llama-cli
* Updating `run-with-preset.py` to use new binary names.
Updating docs around `perplexity` binary rename.
* Updating documentation references for lookup-merge and export-lora
* Updating two small `main` references missed earlier in the finetune docs.
* Update apps.nix
* update grammar/README.md w/ new llama-* names
* update llama-rpc-server bin name + doc
* Revert "update llama-rpc-server bin name + doc"
This reverts commit e474ef1df4 .
* add hot topic notice to README.md
* Update README.md
* Update README.md
* rename gguf-split & quantize bins refs in **/tests.sh
---------
Co-authored-by: HanClinto <hanclinto@gmail.com>
2024-06-13 00:41:52 +01:00
Galunid
9c4c9cc83f
Move convert.py to examples/convert-legacy-llama.py ( #7430 )
...
* Move convert.py to examples/convert-no-torch.py
* Fix CI, scripts, readme files
* convert-no-torch -> convert-legacy-llama
* Move vocab thing to vocab.py
* Fix convert-no-torch -> convert-legacy-llama
* Fix lost convert.py in ci/run.sh
* Fix imports
* Fix gguf not imported correctly
* Fix flake8 complaints
* Fix check-requirements.sh
* Get rid of ADDED_TOKENS_FILE, FAST_TOKENIZER_FILE
* Review fixes
2024-05-30 21:40:00 +10:00
Brian
51e9d02599
Added a single test function script and fix debug-test.sh to be more robust ( #7279 )
...
* run-single-test.sh: added a single test function script and fix debug-test.sh to be more robust
* debug-test.sh: combined execute and gdb test mode via -g flag
* debug-test.sh: refactor
* debug-test: refactor for clarity
* debug-test.sh: comment style changes
* debug-test.sh: fix gdb
2024-05-17 22:40:14 +10:00
Josh Ramer
fed0108491
Scripting & documenting debugging one test without anything else in the loop. ( #7096 )
...
* A little documentation that shares my quick tips for working in the repository.
* Update startup-testing-debugging.md
* script that shows a menu of tests to pick from & run the debugger on
* debug-test.sh: Refactor CLI help message
* debug-test.sh: documentation update
* debug-test.sh: CLI Help output corrections
* debug-test.sh: minor doc fix
---------
authored-by: Josh Ramer <ubuntu@ip-172-31-32-53.ec2.internal>
Assisted-by: brian khuu <mofosyne@gmail.com>
2024-05-12 03:26:35 +10:00
omahs
04976db7a8
docs: fix typos ( #7124 )
...
* fix typo
* fix typos
* fix typo
* fix typos
* fix typo
* fix typos
2024-05-07 18:20:33 +03:00
Pierrick Hymbert
b804b1ef77
eval-callback: Example how to use eval callback for debugging ( #6576 )
...
* gguf-debug: Example how to use ggml callback for debugging
* gguf-debug: no mutex, verify type, fix stride.
* llama: cv eval: move cb eval field in common gpt_params
* ggml_debug: use common gpt_params to pass cb eval.
Fix get tensor SIGV random.
* ggml_debug: ci: add tests
* ggml_debug: EOL in CMakeLists.txt
* ggml_debug: Remove unused param n_batch, no batching here
* ggml_debug: fix trailing spaces
* ggml_debug: fix trailing spaces
* common: fix cb_eval and user data not initialized
* ci: build revert label
* ggml_debug: add main test label
* doc: add a model: add a link to ggml-debug
* ggml-debug: add to make toolchain
* ggml-debug: tests add the main label
* ggml-debug: ci add test curl label
* common: allow the warmup to be disabled in llama_init_from_gpt_params
* ci: add curl test
* ggml-debug: better tensor type support
* gitignore : ggml-debug
* ggml-debug: printing also the sum of each tensor
* ggml-debug: remove block size
* eval-callback: renamed from ggml-debug
* eval-callback: fix make toolchain
---------
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-11 14:51:07 +02:00
Pierrick Hymbert
67fac4b95f
docs : how to add a model ( #6565 )
...
* docs: how to add a model
* docs: model: typo and docs
* docs: model: add prevision on RoPE
* docs: model: rephrasing README.md
* docs: model: rephrasing README.md
* docs: model: README.md fix trailing spaces
* docs : some fixes
* Update README.md
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-04-10 09:58:48 +03:00
slaren
280345968d
cuda : rename build flag to LLAMA_CUDA ( #6299 )
2024-03-26 01:16:01 +01:00
Georgi Gerganov
ff8238f71d
docs : add llama-star arch idea
2023-11-23 11:35:04 +02:00
Richard Kiss
532dd74e38
Fix some documentation typos/grammar mistakes ( #4032 )
...
* typos
* Update examples/parallel/README.md
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
---------
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-11-11 23:04:58 -07:00
uint256_t
9e24cc6e2e
docs : fix typo GOMP_CPU_AFFINITY ( #3597 )
2023-10-12 16:36:16 +03:00
Kevin Ji
45855b3f1c
docs : mark code as Bash ( #3375 )
2023-09-28 09:11:32 -04:00
Georgi Gerganov
6381d4e110
gguf : new file format with flexible meta data (beta) ( #2398 )
...
* gguf : first API pass
* gguf : read header + meta data
* gguf : read tensor info
* gguf : initial model loading - not tested
* gguf : add gguf_get_tensor_name()
* gguf : do not support passing existing ggml_context to gguf_init
* gguf : simplify gguf_get_val
* gguf : gguf.c is now part of ggml.c
* gguf : read / write sample models
* gguf : add comments
* refactor : reduce code duplication and better API (#2415 )
* gguf : expose the gguf_type enum through the API for now
* gguf : add array support
* gguf.py : some code style changes
* convert.py : start a new simplified implementation by removing old stuff
* convert.py : remove GGML vocab + other obsolete stuff
* GGUF : write tensor (#2426 )
* WIP: Write tensor
* GGUF : Support writing tensors in Python
* refactor : rm unused import and upd todos
* fix : fix errors upd writing example
* rm example.gguf
* gitignore *.gguf
* undo formatting
* gguf : add gguf_find_key (#2438 )
* gguf.cpp : find key example
* ggml.h : add gguf_find_key
* ggml.c : add gguf_find_key
* gguf : fix writing tensors
* gguf : do not hardcode tensor names to read
* gguf : write sample tensors to read
* gguf : add tokenization constants
* quick and dirty conversion example
* gguf : fix writing gguf arrays
* gguf : write tensors one by one and code reuse
* gguf : fix writing gguf arrays
* gguf : write tensors one by one
* gguf : write tensors one by one
* gguf : write tokenizer data
* gguf : upd gguf conversion script
* Update convert-llama-h5-to-gguf.py
* gguf : handle already encoded string
* ggml.h : get array str and f32
* ggml.c : get arr str and f32
* gguf.py : support any type
* Update convert-llama-h5-to-gguf.py
* gguf : fix set is not subscriptable
* gguf : update convert-llama-h5-to-gguf.py
* constants.py : add layer norm eps
* gguf.py : add layer norm eps and merges
* ggml.h : increase GGML_MAX_NAME to 64
* ggml.c : add gguf_get_arr_n
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Makefile : add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* add gptneox gguf example
* Update convert-llama-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-gptneox-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* gguf : support custom alignment value
* gguf : fix typo in function call
* gguf : mmap tensor data example
* fix : update convert-llama-h5-to-gguf.py
* Update convert-llama-h5-to-gguf.py
* convert-gptneox-h5-to-gguf.py : Special tokens
* gptneox-main.cpp : special tokens
* Update gptneox-main.cpp
* constants.py : special tokens
* gguf.py : accumulate kv and tensor info data + special tokens
* convert-gptneox-h5-to-gguf.py : accumulate kv and ti + special tokens
* gguf : gguf counterpart of llama-util.h
* gguf-util.h : update note
* convert-llama-h5-to-gguf.py : accumulate kv / ti + special tokens
* convert-llama-h5-to-gguf.py : special tokens
* Delete gptneox-common.cpp
* Delete gptneox-common.h
* convert-gptneox-h5-to-gguf.py : gpt2bpe tokenizer
* gptneox-main.cpp : gpt2 bpe tokenizer
* gpt2 bpe tokenizer (handles merges and unicode)
* Makefile : remove gptneox-common
* gguf.py : bytesarray for gpt2bpe tokenizer
* cmpnct_gpt2bpe.hpp : comments
* gguf.py : use custom alignment if present
* gguf : minor stuff
* Update gptneox-main.cpp
* map tensor names
* convert-gptneox-h5-to-gguf.py : map tensor names
* convert-llama-h5-to-gguf.py : map tensor names
* gptneox-main.cpp : map tensor names
* gguf : start implementing libllama in GGUF (WIP)
* gguf : start implementing libllama in GGUF (WIP)
* rm binary commited by mistake
* upd .gitignore
* gguf : calculate n_mult
* gguf : inference with 7B model working (WIP)
* gguf : rm deprecated function
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : start implementing gguf_file_saver (WIP)
* gguf : add gguf_get_kv_type
* gguf : add gguf_get_kv_type
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver (WIP)
* gguf : write metadata in gguf_file_saver
* gguf : rm references to old file formats
* gguf : shorter name for member variable
* gguf : rm redundant method
* gguf : get rid of n_mult, read n_ff from file
* Update gguf_tensor_map.py
* Update gptneox-main.cpp
* gguf : rm references to old file magics
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : start implementing quantization (WIP)
* gguf : quantization is working
* gguf : roper closing of file
* gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : no need to convert tensors twice
* convert-llama-h5-to-gguf.py : no need to convert tensors twice
* convert-gptneox-h5-to-gguf.py : simplify nbytes
* convert-llama-h5-to-gguf.py : simplify nbytes
* gptneox-main.cpp : n_layer --> n_block
* constants.py : n_layer --> n_block
* gguf.py : n_layer --> n_block
* convert-gptneox-h5-to-gguf.py : n_layer --> n_block
* convert-llama-h5-to-gguf.py : n_layer --> n_block
* gptneox-main.cpp : n_layer --> n_block
* Update gguf_tensor_map.py
* convert-gptneox-h5-to-gguf.py : load model in parts to save memory
* convert-llama-h5-to-gguf.py : load model in parts to save memory
* convert : write more metadata for LLaMA
* convert : rm quantization version
* convert-gptneox-h5-to-gguf.py : add file_type key
* gptneox-main.cpp : add file_type key
* fix conflicts
* gguf : add todos and comments
* convert-gptneox-h5-to-gguf.py : tensor name map changes
* Create gguf_namemap.py : tensor name map changes
* Delete gguf_tensor_map.py
* gptneox-main.cpp : tensor name map changes
* convert-llama-h5-to-gguf.py : fixes
* gguf.py : dont add empty strings
* simple : minor style changes
* gguf : use UNIX line ending
* Create convert-llama-7b-pth-to-gguf.py
* llama : sync gguf-llama.cpp with latest llama.cpp (#2608 )
* llama : sync gguf-llama.cpp with latest llama.cpp
* minor : indentation + assert
* llama : refactor gguf_buffer and gguf_ctx_buffer
* llama : minor
* gitignore : add gptneox-main
* llama : tokenizer fixes (#2549 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* convert : update convert-new.py with tokenizer fixes (#2614 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* llama : sync gguf-llama with llama (#2613 )
* llama : sync gguf-llama with llama
* tests : fix build + warnings (test-tokenizer-1 still fails)
* tests : fix wstring_convert
* convert : fix layer names
* llama : sync gguf-llama.cpp
* convert : update HF converter to new tokenizer voodoo magics
* llama : update tokenizer style
* convert-llama-h5-to-gguf.py : add token types
* constants.py : add token types
* gguf.py : add token types
* convert-llama-7b-pth-to-gguf.py : add token types
* gguf-llama.cpp : fix n_head_kv
* convert-llama-h5-to-gguf.py : add 70b gqa support
* gguf.py : add tensor data layout
* convert-llama-h5-to-gguf.py : add tensor data layout
* convert-llama-7b-pth-to-gguf.py : add tensor data layout
* gptneox-main.cpp : add tensor data layout
* convert-llama-h5-to-gguf.py : clarify the reverse permute
* llama : refactor model loading code (#2620 )
* llama : style formatting + remove helper methods
* llama : fix quantization using gguf tool
* llama : simplify gguf_file_saver
* llama : fix method names
* llama : simplify write_header()
* llama : no need to pass full file loader to the file saver
just gguf_ctx
* llama : gguf_file_saver write I32
* llama : refactor tensor names (#2622 )
* gguf: update tensor names searched in quantization
* gguf : define tensor names as constants
* gguf : initial write API (not tested yet)
* gguf : write to file API (not tested)
* gguf : initial write API ready + example
* gguf : fix header write
* gguf : fixes + simplify example + add ggml_nbytes_pad()
* gguf : minor
* llama : replace gguf_file_saver with new gguf write API
* gguf : streaming support when writing files
* gguf : remove oboslete write methods
* gguf : remove obosolete gguf_get_arr_xxx API
* llama : simplify gguf_file_loader
* llama : move hparams and vocab from gguf_file_loader to llama_model_loader
* llama : merge gguf-util.h in llama.cpp
* llama : reorder definitions in .cpp to match .h
* llama : minor simplifications
* llama : refactor llama_model_loader (WIP)
wip : remove ggml_ctx from llama_model_loader
wip : merge gguf_file_loader in llama_model_loader
* llama : fix shape prints
* llama : fix Windows build + fix norm_rms_eps key
* llama : throw error on missing KV paris in model meta data
* llama : improve printing + log meta data
* llama : switch print order of meta data
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
* gguf : deduplicate (#2629 )
* gguf : better type names
* dedup : CPU + Metal is working
* ggml : fix warnings about unused results
* llama.cpp : fix line feed and compiler warning
* llama : fix strncpy warning + note token_to_str does not write null
* llama : restore the original load/save session implementation
Will migrate this to GGUF in the future
* convert-llama-h5-to-gguf.py : support alt ctx param name
* ggml : assert when using ggml_mul with non-F32 src1
* examples : dedup simple
---------
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
* gguf.py : merge all files in gguf.py
* convert-new.py : pick #2427 for HF 70B support
* examples/gguf : no need to keep q option for quantization any more
* llama.cpp : print actual model size
* llama.cpp : use ggml_elements()
* convert-new.py : output gguf (#2635 )
* convert-new.py : output gguf (WIP)
* convert-new.py : add gguf key-value pairs
* llama : add hparams.ctx_train + no longer print ftype
* convert-new.py : minor fixes
* convert-new.py : vocab-only option should work now
* llama : fix tokenizer to use llama_char_to_byte
* tests : add new ggml-vocab-llama.gguf
* convert-new.py : tensor name mapping
* convert-new.py : add map for skipping tensor serialization
* convert-new.py : convert script now works
* gguf.py : pick some of the refactoring from #2644
* convert-new.py : minor fixes
* convert.py : update to support GGUF output
* Revert "ci : disable CI temporary to not waste energy"
This reverts commit 7e82d25f40 .
* convert.py : n_head_kv optional and .gguf file extension
* convert.py : better always have n_head_kv and default it to n_head
* llama : sync with recent PRs on master
* editorconfig : ignore models folder
ggml-ci
* ci : update ".bin" to ".gguf" extension
ggml-ci
* llama : fix llama_model_loader memory leak
* gptneox : move as a WIP example
* llama : fix lambda capture
ggml-ci
* ggml : fix bug in gguf_set_kv
ggml-ci
* common.h : .bin --> .gguf
* quantize-stats.cpp : .bin --> .gguf
* convert.py : fix HF tensor permuting / unpacking
ggml-ci
* llama.cpp : typo
* llama : throw error if gguf fails to init from file
ggml-ci
* llama : fix tensor name grepping during quantization
ggml-ci
* gguf.py : write tensors in a single pass (#2644 )
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : single pass for writing tensors + refactoring writer
* gguf : style fixes in simple conversion script
* gguf : refactor gptneox conversion script
* gguf : rename h5 to hf (for HuggingFace)
* gguf : refactor pth to gguf conversion script
* gguf : rm file_type key and method
* gguf.py : fix vertical alignment
* gguf.py : indentation
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* convert-gptneox-hf-to-gguf.py : fixes
* gguf.py : gptneox mapping
* convert-llama-hf-to-gguf.py : fixes
* convert-llama-7b-pth-to-gguf.py : fixes
* ggml.h : reverse GGUF_MAGIC
* gguf.py : reverse GGUF_MAGIC
* test-tokenizer-0.cpp : fix warning
* llama.cpp : print kv general.name
* llama.cpp : get special token kv and linefeed token id
* llama : print number of tensors per type + print arch + style
* tests : update vocab file with new magic
* editorconfig : fix whitespaces
* llama : re-order functions
* llama : remove C++ API + reorganize common source in /common dir
* llama : minor API updates
* llama : avoid hardcoded special tokens
* llama : fix MPI build
ggml-ci
* llama : introduce enum llama_vocab_type + remove hardcoded string constants
* convert-falcon-hf-to-gguf.py : falcon HF --> gguf conversion, not tested
* falcon-main.cpp : falcon inference example
* convert-falcon-hf-to-gguf.py : remove extra kv
* convert-gptneox-hf-to-gguf.py : remove extra kv
* convert-llama-7b-pth-to-gguf.py : remove extra kv
* convert-llama-hf-to-gguf.py : remove extra kv
* gguf.py : fix for falcon 40b
* falcon-main.cpp : fix for falcon 40b
* convert-falcon-hf-to-gguf.py : update ref
* convert-falcon-hf-to-gguf.py : add tensor data layout
* cmpnct_gpt2bpe.hpp : fixes
* falcon-main.cpp : fixes
* gptneox-main.cpp : fixes
* cmpnct_gpt2bpe.hpp : remove non-general stuff
* Update examples/server/README.md
Co-authored-by: slaren <slarengh@gmail.com>
* cmpnct_gpt2bpe.hpp : cleanup
* convert-llama-hf-to-gguf.py : special tokens
* convert-llama-7b-pth-to-gguf.py : special tokens
* convert-permute-debug.py : permute debug print
* convert-permute-debug-master.py : permute debug for master
* convert-permute-debug.py : change permute type of attn_q
* convert.py : 70b model working (change attn_q permute)
* Delete convert-permute-debug-master.py
* Delete convert-permute-debug.py
* convert-llama-hf-to-gguf.py : fix attn_q permute
* gguf.py : fix rope scale kv
* convert-llama-hf-to-gguf.py : rope scale and added tokens
* convert-llama-7b-pth-to-gguf.py : rope scale and added tokens
* llama.cpp : use rope scale kv
* convert-llama-7b-pth-to-gguf.py : rope scale fix
* convert-llama-hf-to-gguf.py : rope scale fix
* py : fix whitespace
* gguf : add Python script to convert GGMLv3 LLaMA models to GGUF (#2682 )
* First pass at converting GGMLv3 LLaMA models to GGUF
* Cleanups, better output during conversion
* Fix vocab space conversion logic
* More vocab conversion fixes
* Add description to converted GGUF files
* Improve help text, expand warning
* Allow specifying name and description for output GGUF
* Allow overriding vocab and hyperparams from original model metadata
* Use correct params override var name
* Fix wrong type size for Q8_K
Better handling of original style metadata
* Set default value for gguf add_tensor raw_shape KW arg
* llama : improve token type support (#2668 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* llama : add API for token type
ggml-ci
* tests : use new tokenizer type API (#2692 )
* Merge tokenizer fixes into the gguf branch.
* Add test vocabularies
* Adapt convert-new.py (and fix a clang-cl compiler error on windows)
* Improved tokenizer test
But does it work on MacOS?
* Improve token type support
- Added @klosax code to convert.py
- Improved token type support in vocabulary
* Exclude platform dependent tests
* More sentencepiece compatibility by eliminating magic numbers
* Restored accidentally removed comment
* Improve commentary
* Use token type API in test-tokenizer-1.cpp
* py : cosmetics
* readme : add notice about new file format
ggml-ci
---------
Co-authored-by: M. Yusuf Sarıgöz <yusufsarigoz@gmail.com>
Co-authored-by: klosax <131523366+klosax@users.noreply.github.com>
Co-authored-by: goerch <jhr.walter@t-online.de>
Co-authored-by: slaren <slarengh@gmail.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-08-21 23:07:43 +03:00
Yuval Peled
f4c55d3bd7
docs : add performance troubleshoot + example benchmark documentation ( #1674 )
...
* test anchor link
* test table
* add benchmarks
* Add performance troubleshoot & benchmark
* add benchmarks
* remove unneeded line
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-06-05 23:32:36 +03:00