llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	05fec5bd29	ggml : add build-time message to remind about ggml_set_rows (#14661 ) ggml-ci	2025-07-13 10:36:33 +03:00
Xuan-Son Nguyen	98bab638fb	ggml : add ggml_scale_bias (#14417 ) * ggml : add ggml_scale_bias * ggml_vec_mad1_f32 * add more simd * add CUDA * sycl * vulkan * cann (placeholder) * opencl * will this fix cpu? * fix cuda * suggestions from coderabbit * fix cann compile error * vDSP_vsmsa * rm __ARM_FEATURE_SVE * use memcpy for op params * make code looks more consistent * use scalar for __ARM_FEATURE_SVE * add x param to ggml_vec_mad1_f32	2025-07-09 18:16:12 +02:00
Georgi Gerganov	a70c8a0c4b	kv-cache : use ggml_set_rows (#14285 ) * kv-cache : use ggml_set_rows ggml-ci * graph : separate k and v indices ggml-ci * cont : remove redundant ifs ggml-ci * kv-cache : improve find_slot impl * kv-cache : bounds-check when accessing slot_info indices * kv-cache : add comments ggml-ci * ggml : add TODOs for adding GGML_OP_SET_ROWS support in the backends ggml-ci	2025-07-03 10:53:35 +03:00
Georgi Gerganov	ec68e84c32	ggml : support bcast ggml_soft_max_ext, ggml_flash_attn_ext (#14435 ) ggml-ci	2025-07-02 15:48:33 +03:00
Xinpeng Dou	e21d2d4ae2	CANN: Simplify the environment variable setting(#13104 ) * Simplify the environment variable setting to specify the memory pool type. * Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options. * update * fix CI * update * delete whitespace * fix according to review * update CANN.md * update CANN.md	2025-06-09 19:47:39 +08:00
Bizhao Shi	2d38b6e400	CANN: Add the basic supports of Flash Attention kernel (#13627 ) * cann: add the basic FA support * cann: update the readme * cann: update the FlashAttention with PSEShift * cann: update the input parameters in FA * cann: update the alibi with max_bias * cann: add the constrints of softcap * cann: update the docs CANN.md * cann: update the docs CANN.md * cann: fix typo of CANN.md * cann: add some comments and update the CANN.md * cann: update the CANN.md * cann: update the inner precise for fusedInferAttention * cann: update the constraints of flash_attn_ext on ggml-cann.cpp * cann: clean the whitespace * cann: clean the whitespace * cann: add a new endline	2025-05-26 10:20:18 +08:00
Chenguang Li	faaaff5f94	CANN: Support MUL_MAT_ID for q8_0 and q4_0 (#13705 ) * [CANN]Support MUL_MAT_ID Q8 && Q4 Signed-off-by: noemotiovon <757486878@qq.com> * codestyle adjustment Signed-off-by: noemotiovon <757486878@qq.com> --------- Signed-off-by: noemotiovon <757486878@qq.com>	2025-05-23 16:47:53 +08:00
Chenguang Li	33d7aed4a8	CANN: Support MOE Model MUL_MAT_ID (#13042 ) Signed-off-by: noemotiovon <757486878@qq.com>	2025-05-19 14:21:17 +08:00
hipudding	7a395f67a7	CANN: Add support for async operator submission (#12864 ) Submit operators using asynchronous threads to improve performance. Use the environment variable GGML_CANN_ASYNC_MODE to control whether asynchronous submission is enabled. It is disabled by default. Testing shows a 10%–20% performance improvement in scenarios with small parameter sizes, especially in quantized models.	2025-04-17 20:34:16 +08:00
Chenguang Li	b43d89e311	CANN: Add 310P operator support check (#12962 )	2025-04-16 16:21:05 +08:00
hipudding	54a7272043	CANN: Add x86 build ci (#12950 ) * CANN: Add x86 build ci * CANN: fix code format	2025-04-15 12:08:55 +01:00
Chenguang Li	0019279bb5	CANN: Opt ROPE optimization (#12865 ) * [CANN]Opt ROPE optimization * [CANN]Codestyle adjustment * [CANN]Fix the ROPE precision issue * [CANN]codestyle fix * [CANN]add rope unsupport case Signed-off-by: noemotiovon <noemotiovon@gmail.com>	2025-04-15 10:09:35 +08:00
Xinpeng Dou	b0c75ac9f9	CANN: Optimize CANN buffer pool memory management (#12875 ) Multiple optional memory pools are provided for CANN, including VMM, priority queue-based, and traditional memory pools. 1.When the memory pool is available and GGML_CANN_DISABLE_VMM_POOL is not defined, the VMM pool is selected by default. 2.Otherwise, if GGML_CANN_ENABLE_BUF_PRIO_POOL is defined, the priority queue-based memory pool is used. 3.If neither condition is met, the default memory pool is used.	2025-04-15 10:04:24 +08:00
Diego Devesa	fe92821ea9	ggml : add bilinear upscale support (ggml/1185)	2025-04-11 00:17:47 +03:00
Chenguang Li	fe5b78c896	CANN: Support more ops (#12841 ) * [CANN]Support Opt LOG && MEAN && PAD_REFLECT_1D * [CANN]Support COUNT_EQUAL && STEP && SGN * [CANN]codestyle adjustment * [CANN]codestyle adjustment --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com>	2025-04-10 08:51:52 +08:00
Chenguang Li	6e1c4cebdb	CANN: Support Opt CONV_TRANSPOSE_1D and ELU (#12786 ) * [CANN] Support ELU and CONV_TRANSPOSE_1D * [CANN]Modification review comments * [CANN]Modification review comments * [CANN]name adjustment * [CANN]remove lambda used in template * [CANN]Use std::func instead of template * [CANN]Modify the code according to the review comments --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com>	2025-04-09 14:04:14 +08:00
zhouwg	52b3d71f12	CANN: fix typo in ggml-cann (#12733 )	2025-04-07 19:34:14 +08:00
hipudding	d0d5b2232b	CANN: Refactor to reduce duplicate code (#12731 ) * CANN: Refactor to reduce duplicate code * CANN: fix review comment	2025-04-07 17:10:36 +08:00
Chenguang Li	65cfe136a0	CANN: Support operator SIN COS ARGMAX (#12709 ) * [CANN]support sin cos argmax Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]codestyle adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]Remove redundant code Signed-off-by: noemotiovon <noemotiovon@gmail.com> --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2025-04-03 15:18:08 +08:00
hipudding	2a0dc97e56	CANN: Fix failed test cases (#12708 ) * CANN: Fix memory waste in aclnn_tensor * CANN: fix backend ops fail * CANN: fix acl_tensor memory alloc. * CANN: format * CANN: remove trailing whitespace	2025-04-03 08:49:51 +08:00
Chenguang Li	9bacd6b374	[CANN] get_rows and dup optimization (#12671 ) * [CANN]get_rows and dup optimization. Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]GET_ROWS and CPY/DUP optimization Co-authored-by: hipudding <huafengchun@gmail.com> Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> * [CANN]code style adjustment Signed-off-by: noemotiovon <noemotiovon@gmail.com> --------- Signed-off-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: noemotiovon <noemotiovon@gmail.com> Co-authored-by: hipudding <huafengchun@gmail.com>	2025-04-02 15:22:13 +08:00
Chenguang Li	92a391327e	[CANN]MUL_MAT optimization (#12382 )	2025-03-15 09:31:08 +08:00
William Tambellini	70680c48e5	ggml : upgrade init_tensor API to return a ggml_status (#11854 ) * Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <slarengh@gmail.com>	2025-02-28 14:41:47 +01:00
HimariO	ba1cb19cdd	llama : add Qwen2VL support + multimodal RoPE (#10361 ) * Barebone Qwen2VL LLM convertor * Add Qwen2VL cli entrypoint * [WIP] add qwen2vl arch * Verify m-rope output * Add vl-rope/2d-rope support for qwen2vl ViT * update qwen2vl cli tool * update 5D tensor op workaround * [WIP] qwen2vl vision model * make batch and clip utils compatible with qwen2vl * [WIP] create inference workflow, gguf convert script but fix * correcting vision-rope behavior, add the missing last layer back to ViT * add arg parser to qwen2vl_surgery * replace variable size array with vector * cuda-gdb cmake preset * add fp32 mrope, vision rope kernel * add fp16 support for qwen2vl and m-rope * add `GGML_ROPE_TYPE_MROPE`, `GGML_ROPE_TYPE_VISION` * fix rope op mode switching, out dated func args * update `llama_hparams` * update to keep up stream changes * resolve linter, test errors * add makefile entry, update speical image padding token * add mrope unit test, fix few compiler warnings * rename `mrope` related function, params * minor updates on debug util, bug fixs * add `m-rope` testcase to `test-backend-ops` * Apply suggestions from code review Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix traililng whitespce * store `llama_hparams.rope_sections` with fixed size array * update position id tensor size check in GGML_OP_ROPE * minor updates * update `ggml_backend__supports_op` of unsupported backends remote old `rope_section` compare operator --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-14 14:43:46 +02:00
Djip007	19d8762ab6	ggml : refactor online repacking (#10446 ) * rename ggml-cpu-aarch64.c to .cpp * reformat extra cpu backend. - clean Q4_0_N_M and IQ4_0_N_M - remove from "file" tensor type - allow only with dynamic repack - extract cpu extra bufts and convert to C++ - hbm - "aarch64" - more generic use of extra buffer - generalise extra_supports_op - new API for "cpu-accel": - amx - aarch64 * clang-format * Clean Q4_0_N_M ref Enable restrict on C++ * add op GGML_OP_MUL_MAT_ID for Q4_0_N_M with runtime repack * added/corrected control on tensor size for Q4 repacking. * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Update ggml/src/ggml-cpu/ggml-cpu-aarch64.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * add debug logs on repacks. --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-12-07 14:37:50 +02:00
Chenguang Li	938f608742	CANN: RoPE operator optimization (#10563 ) * [cann] RoPE operator optimization * [CANN]Code Formatting --------- Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-11-29 14:46:55 +08:00
Chenguang Li	b7420131bf	CANN: ROPE operator optimization (#10540 ) * [cann] ROPE operator optimization Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-11-28 14:24:46 +08:00
Shanshan Shen	9a4b79bcfa	CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454 ) * improve inferencing performance for ascend npu. Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com> * some modification after review * some modifications after review * restore some modifications * restore some modifications --------- Co-authored-by: shanshan shen <shanshanshen333@gmail.com> Co-authored-by: Frank Mai <thxCode@thxcode0824@gmail.com>	2024-11-26 18:08:37 +08:00
Chenguang Li	7066b4cce2	CANN: RoPE and CANCAT operator optimization (#10488 ) Co-authored-by: noemotiovon <noemotiovon@gmail.com>	2024-11-26 17:31:05 +08:00
Diego Devesa	5931c1f233	ggml : add support for dynamic loading of backends (#10469 ) * ggml : add support for dynamic loading of backends --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-25 15:13:39 +01:00
Diego Devesa	ae8de6d50a	ggml : build backends as libraries (#10256 ) * ggml : build backends as libraries --------- Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: R0CKSTAR <xiaodong.ye@mthreads.com>	2024-11-14 18:04:35 +01:00

31 Commits