llama.cpp

History

Piotr Wilkin (ilintar) 34fcc5a4ac model : Apertus model implementation (#15852 ) * First attempt * No permute during convert (fixes qk tensors), proper norm application. * RoPE = NeoX * Coherence! * Migrate xielu params from tensors to hyperparameters * Simple CUDA kernel * Revert stupid LLM refactorings * Chat template support * configchecker / flake8 errors * Reorder unary.cu * I do conclude that LLMs are, in fact, stupid. * Fix after merge * Final newline * Make xIELU an UNARY_OP * Final newline * Correctly account for parameter shift * Argh. * Update ggml/src/ggml-cpu/unary-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Refactor: remove unused methods, inline and factorize softplus, add const modifiers * Revert CUDA changes, implement xIELU as a separate OP * Pesky newline * Add float2half / half2float for F16 inputs/outputs * CUDA variants, attempt 2 * Actually, attempt 3 * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Missing convert header * Proper formula and reference for xIELU in the comments. * Modify unary-ops.cpp to add the functor-based logic besides the template system to retain optimizations * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add tensor mappings for Apertus to global list instead * Fix lazy on scalars * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Add comment about the constraints on positive/negative alpha * Change `softplus` to `ggml_softplus` --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>		2025-10-02 20:43:22 +03:00
..
amx	ggml-amx : fix ggml_amx_init() on generic Linux (#16049 )	2025-09-18 23:07:26 +02:00
arch	devops: add s390x & ppc64le CI (#15925 )	2025-09-27 02:03:33 +08:00
cmake	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
kleidiai	kleidiai : fix work size and threads sync for fp16 (#16246 )	2025-09-30 10:07:20 +03:00
llamafile	llamafile: PowerPC Sgemm Optimization (#15558 )	2025-08-26 23:35:25 +08:00
spacemit	ggml: riscv: add riscv spacemit backend (#15288 )	2025-09-29 17:50:44 +03:00
CMakeLists.txt	kleidiai : fix work size and threads sync for fp16 (#16246 )	2025-09-30 10:07:20 +03:00
arch-fallback.h	ggml-cpu: implement MXFP4 SIMD for s390x (#16193 )	2025-09-26 13:27:25 +03:00
binary-ops.cpp	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-30 08:33:31 +03:00
binary-ops.h	cpu: de-duplicate some of the operators and refactor (ggml/1144)	2025-03-30 08:33:31 +03:00
common.h	ggml : refactor forward_dup for cpu backend (#16062 )	2025-09-19 06:31:56 +02:00
ggml-cpu-impl.h	ggml-cpu: clean up s390x SIMD (#15855 )	2025-09-08 02:18:28 +08:00
ggml-cpu.c	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
ggml-cpu.cpp	ggml: riscv: add riscv spacemit backend (#15288 )	2025-09-29 17:50:44 +03:00
hbm.cpp	ggml-cpu : split arch-specific implementations (#13892 )	2025-06-09 16:47:13 +02:00
hbm.h	ggml-cpu : split arch-specific implementations (#13892 )	2025-06-09 16:47:13 +02:00
ops.cpp	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
ops.h	ggml: add ops for WAN video model (cuda && cpu) (#15669 )	2025-09-04 10:38:49 +02:00
quants.c	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
quants.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
repack.cpp	ggml : repack block_iq4_nlx8 (#14904 )	2025-08-13 11:09:39 +03:00
repack.h	ggml : repack block_iq4_nlx8 (#14904 )	2025-08-13 11:09:39 +03:00
simd-mappings.h	ggml : fix loongarch lsx compilation error (#15864 )	2025-09-25 12:22:55 +03:00
traits.cpp	ggml : fix fallback to CPU for ununsupported ops (#15118 )	2025-08-06 14:37:35 +02:00
traits.h	ggml : fix fallback to CPU for ununsupported ops (#15118 )	2025-08-06 14:37:35 +02:00
unary-ops.cpp	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
unary-ops.h	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
vec.cpp	ggml-cpu : optimize RVV kernels (#15720 )	2025-09-03 16:16:21 +08:00
vec.h	ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 (#16307 )	2025-09-28 23:15:03 +02:00