* ggml : fix gguf string leak when reading kv pairs fails
* ggml : avoid crashing with GGML_ABORT when the KV has an invalid type
* ggml : avoid crashing on failed memory allocations when loading a gguf file
* ggml: Add POOL2D OP for GPU ACC to the Vulkan.
- The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend.
- A GGML_OP_POOL_2D shader has been added. (Pooling)
- The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* [fix] Correct the incorrect order of the parameters.
fix casting to int.
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
---------
Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>
* Add granite template to llama.cpp
* Add granite template to test-chat-template.cpp
* Update src/llama.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Update tests/test-chat-template.cpp
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Added proper template and expected output
* Small change to \n
Small change to \n
* Add code space &
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* Fix spacing
* Apply suggestions from code review
* Update src/llama.cpp
---------
Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>
* ggml_qnn_op_config now manager the construction of ggml_qnn_tensor
* wip
* add interface ggml_qnn_op_config
* add ggml_qnn_list_op_config
* add create_tensor and move tensor bind to execute
* wip
* rename: ggml_qnn_list_op_config -> ggml_qnn_matmul_op_config
* add tensortype to allow native tensor
* remove ggml_tensor param at ggml_qnn_tensor::create_tensor
* postpone the tensor id allocation to add_node
* add ggml_qnn_op_config_base
* trival change to reduct the param of function
* split bind_tensors into bind_input_tensors and bind_output_tensors
* implement ggml_qnn_single_op_config::create_tensors
next will set the prameter of transpose
* tensor: add bind buffer
* add parameter tensor type
* implement add_tensor_param
* set qnn_instance only at constructor
* set transpose tensor param
* move create_op_constructor into op-config module
* create QNN_OP_MAT_MUL from ggml_qnn_matmul_op_config
* try fix crash
* fix compiling error at older ndk (r23c)
* fix crash
* fix parameter tensor name
* update tensor dimension assignment and add TODO
* fix mat_mul graph creating
* fix MUL_MAT_256x16x10x1_256x1x10x1_16x1x10x1
* append type to graph cache key
* wip
* fix supported op
* update comment
* disable op other than add and mat_mul
* add convert op to adapt multi input/output format
* disable f16 for cpu backend according to official doc
https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/cpu_backend.html#supported-operations
* add supported data types flags in each backend
* remove unused functions
* append output type to graph key
* fix gpu backend by disable the different data type op
* fix cpu backend support ops
* fix duplicated tensor name
* append op name
* suppress warning
* remove unused code
* metal : support permuted matrix multiplicaions
ggml-ci
* cont : use nb01 directly for row steps
ggml-ci
* cont : add comments [no ci]
* metal : minor refactor
* metal : minor
* llama: Refactor string_split to use template specialization, fixes parsing strings with spaces
* llama: Add static_assert in the string_split template to ensure the correct template specialization is used for std::string
This commit removes the setting of the `used` field of the contexts in
the global state (g_state) in `ggml_init`.
The motivation for this change is that I believe that this additional
initialization might not be required after the changes in Commit
45fc4fed0b9fb5b1af4a8525cbebb95e11208732 ("sync : latest changes from
whisper.cpp"), which changed the initialization of the contexts field
from `{ 0 }` to `{ { 0 } }`:
```console
g_state = (struct ggml_state) {
- /*.contexts =*/ { 0 },
+ /*.contexts =*/ { { 0 } },
};
```
My understanding is that the `{0}` initialization might not have
zero-initialized all the nested fields in every array element because of
compiler differences, and might have been the reason for having the
explicit setting of the `used` fields to false.
* added classic vim support
* fixed ring update, removed blank line
* minor
* minor
* minor doc update
* removed uneeded var
* minor
* minor
* fixed job_start creating new scratch buffers
* fixed job_start creating new scratch buffers
* fixed ghost text indenting when expandtab is on
* removed unused code
* minor
* unified fim_on_exit
* minor
* vim ghost text rendering now uses pos_x and pos_y parameters
* renamed *_hlgroup to hlgroup_*
* renamed *_ghost_text to ghost_text_*, moved nvim/vim detection to llama#init()
* minor
---------
Co-authored-by: Michael Coppola <info@michaeljcoppola.com>
This commit renames the member field batch in llm_build_context to
ubatch, and also the parameter batch in llama_build_graph, and
llama_set_inputs to ubatch.
The motivation for this change is to make the code more readable
(considering there are the structs llama_batch and llama_sbatch), and
consistent with other parts of the code base where parameters/fields of
type llama_ubatch are named ubatch.
* [CANN] Adapt to dynamically loadable backends mechanism
* Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class
* Handle the review comments of this pull request
This commit fixes two typos in the help text for the `--embd-normalize`
and `--embd-separator` arguments. It also updates common.h which contain
the same typo in two comments.
This commit updates the argument value hint for the `--attention`
argument to `non-causal`.
The motivation for this change is that the only values for this argument
are `causal` and `non-causal`.