llama.cpp

History

Piotr Wilkin (ilintar) 34fcc5a4ac model : Apertus model implementation (#15852 ) * First attempt * No permute during convert (fixes qk tensors), proper norm application. * RoPE = NeoX * Coherence! * Migrate xielu params from tensors to hyperparameters * Simple CUDA kernel * Revert stupid LLM refactorings * Chat template support * configchecker / flake8 errors * Reorder unary.cu * I do conclude that LLMs are, in fact, stupid. * Fix after merge * Final newline * Make xIELU an UNARY_OP * Final newline * Correctly account for parameter shift * Argh. * Update ggml/src/ggml-cpu/unary-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * Refactor: remove unused methods, inline and factorize softplus, add const modifiers * Revert CUDA changes, implement xIELU as a separate OP * Pesky newline * Add float2half / half2float for F16 inputs/outputs * CUDA variants, attempt 2 * Actually, attempt 3 * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Missing convert header * Proper formula and reference for xIELU in the comments. * Modify unary-ops.cpp to add the functor-based logic besides the template system to retain optimizations * Apply suggestions from code review Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Add tensor mappings for Apertus to global list instead * Fix lazy on scalars * Update ggml/src/ggml-cuda/unary.cu Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * Add comment about the constraints on positive/negative alpha * Change `softplus` to `ggml_softplus` --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Johannes Gäßler <johannesg@5d6.de> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>		2025-10-02 20:43:22 +03:00
..
batched-bench	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
cvector-generator	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
export-lora	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
gguf-split	ci : use smaller model (#16168 )	2025-09-22 09:11:39 +03:00
imatrix	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
llama-bench	llama-bench: add --devices and --list-devices support (#16039 )	2025-09-20 00:15:21 +02:00
main	llama-cli: prevent spurious assistant token (#16202 )	2025-09-29 10:03:12 +03:00
mtmd	mtmd : fix uninitialized variable in bicubic_resize (#16275 )	2025-09-26 15:00:44 +02:00
perplexity	perplexity : show more kl-divergence data (#16321 )	2025-09-29 09:30:45 +03:00
quantize	ci : use smaller model (#16168 )	2025-09-22 09:11:39 +03:00
rpc	rpc : fix regression when --device is used (#15981 )	2025-09-14 12:28:18 +03:00
run	common: introduce http.h for httplib-based client (#16373 )	2025-10-01 20:22:18 +03:00
server	Conversation action dialogs as singletons from Chat Sidebar + apply conditional rendering for Actions Dropdown for Chat Conversation Items (#16369 )	2025-10-01 18:18:10 +02:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	model : Apertus model implementation (#15852 )	2025-10-02 20:43:22 +03:00
CMakeLists.txt	mtmd : rename llava directory to mtmd (#13311 )	2025-05-05 16:02:55 +02:00