llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	1d801d27b9	graph : update attn/kv_self names	2025-02-14 17:22:55 +02:00
Georgi Gerganov	828064564c	context : move common inputs to base class ggml-ci	2025-02-14 16:48:21 +02:00
Georgi Gerganov	d5e8e1a2ba	context : remove batch_manager ggml-ci	2025-02-14 16:10:55 +02:00
Georgi Gerganov	131743ff4f	context : abstract constructor and init ggml-ci	2025-02-13 17:17:51 +02:00
Georgi Gerganov	ed3cb55abe	context : abstract input ggml-ci	2025-02-13 15:53:15 +02:00
Georgi Gerganov	107d1e2c32	context : move output functionality to base class ggml-ci	2025-02-13 15:42:14 +02:00
Georgi Gerganov	e08f38df69	context : minor cleanup ggml-ci	2025-02-13 12:50:53 +02:00
Georgi Gerganov	f7c7757bab	context : abstract state read/write ggml-ci	2025-02-13 12:37:28 +02:00
Georgi Gerganov	3a504d9a0b	llama : introduce llama_io interfaces ggml-ci	2025-02-13 12:25:54 +02:00
Georgi Gerganov	fbe6a07256	context : rename to llama_context_kv_self	2025-02-12 17:16:44 +02:00
Georgi Gerganov	6ee86e5e0f	graph : restore ubatch in build_cb ggml-ci	2025-02-12 16:29:15 +02:00
Georgi Gerganov	f63aeecce6	llama : models now build their graphs using llama_graph_i ggml-ci	2025-02-12 15:08:40 +02:00
Georgi Gerganov	0ab50f1bbb	context : prepare llama_model graph build ggml-ci	2025-02-12 14:09:55 +02:00
Georgi Gerganov	e633dc171a	context : introduce llama_graph_i ggml-ci	2025-02-12 13:49:44 +02:00
Georgi Gerganov	5eae8e5183	context : move build_rope_factors to base class ggml-ci	2025-02-12 13:32:02 +02:00
Georgi Gerganov	d146a14f77	context : minor naming fix	2025-02-12 12:41:36 +02:00
Georgi Gerganov	8da7f612b7	context : improve llama_context encapsulation ggml-ci	2025-02-12 12:15:04 +02:00
Georgi Gerganov	b52b79b048	context : move encode/decode to llama-context.cpp	2025-02-12 11:23:38 +02:00
Georgi Gerganov	02ef4be975	context : initial abstraction ggml-ci	2025-02-11 22:27:21 +02:00
Georgi Gerganov	2cd8a903c8	context : make output functions members ggml-ci	2025-02-10 17:01:27 +02:00
Georgi Gerganov	d1d8d53008	bman : remove ubatch member ggml-ci	2025-02-10 16:50:14 +02:00
Georgi Gerganov	ef358ee78f	context : add decode/encode ggml-ci	2025-02-10 16:14:13 +02:00
Georgi Gerganov	f9971ef2e1	llama : dedup reserve code	2025-02-10 14:59:51 +02:00
Georgi Gerganov	972f91c7d7	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-10 14:45:54 +02:00
Georgi Gerganov	bdcf8b6a56	cont : fix mmap flag print (#11699 )	2025-02-08 16:49:38 +02:00
Georgi Gerganov	ed926d8833	llama : fix defrag logic (#11707 ) * llama : fix defrag logic ggml-ci * cont : better logic ggml-ci * cont : clamp fragmentation to 0.0 ggml-ci	2025-02-07 16:05:34 +02:00
Christian Fillion	2d219b389e	vocab : ignore invalid UTF-8 input in the BPE tokenizer (#11729 ) Silently insert U+FFFD(s) (Unicode replacement character) instead until the next valid codepoint can be found. This fixes `llama_tokenize` throwing an exception across the C API boundary or libllama's module boundary (the caller's runtime might be incompatible!) Returing a proper error code might be desirable, however the signature of `llama_tokenize` doesn't allow it as all return values already have existing meaning.	2025-02-07 15:55:47 +02:00
magicse	333820d749	llama : fix progress dots (#11730 ) * Update llama.cpp For display progress dots in terminal. Without this it didn't display dots progress during loading model from file. * Update llama.cpp removed trailing spaces	2025-02-07 15:48:47 +02:00
Christian Fillion	7ee953a64a	llama : add llama_sampler_init for safe usage of llama_sampler_free (#11727 ) The C API in llama.h claims users can implement `llama_sampler_i` to create custom `llama_sampler`. The sampler chain takes ownership and calls `llama_sampler_free` on them. However, `llama_sampler_free` is hard-coded to use `delete`. This is undefined behavior if the object wasn't also allocated via `new` from libllama's C++ runtime. Callers in C and C-compatible languages do not use C++'s `new` operator. C++ callers may not be sharing the same heap as libllama.	2025-02-07 11:33:27 +02:00
tv1wnd	855cd0734a	llama : fix old glm4 models (#11670 )	2025-02-06 22:48:51 +01:00
Georgi Gerganov	b15fede7a9	kv-cache : fix defrag condition ggml-ci	2025-02-06 14:35:19 +02:00
Georgi Gerganov	9dd7a0390f	llama : add log about loading model tensors (#11699 )	2025-02-06 13:41:37 +02:00
Georgi Gerganov	0f1c1cab2c	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-06 10:04:33 +02:00
Georgi Gerganov	e0d913fccb	llama : clear whitespaces	2025-02-06 10:02:50 +02:00
Johannes Gäßler	fd08255d0d	CUDA: non-contiguous (RMS) norm support (#11659 ) * CUDA: non-contiguous (RMS) norm support --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-04 22:21:42 +01:00
Molly Sophia	1eca8916b5	llama : fix rwkv inference (#11618 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-02-03 14:17:50 +02:00
Olivier Chafik	90f9b88afb	nit: more informative crash when grammar sampler fails (#11593 )	2025-02-02 19:58:34 +00:00
Georgi Gerganov	74b0807245	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-02 11:07:05 +02:00
Georgi Gerganov	3e23be7911	context : store graph build function callback ggml-ci	2025-02-02 10:49:32 +02:00
piDack	0cec062a63	llama : add support for GLM-Edge and GLM-Edge-V series models (#10573 ) * add glm edge chat model * use config partial_rotary_factor as rope ratio * support for glm edge model * vision model support * remove debug info * fix format * llava.cpp trailing whitespace * remove unused AutoTokenizer * Update src/llama.cpp for not contain <\|end\|> or </s> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> * add edge template * fix chat template * fix confict * fix confict * fix ci err * fix format err * fix template err * 9b hf chat support * format * format clip.cpp * fix format * Apply suggestions from code review * Apply suggestions from code review * Update examples/llava/clip.cpp * fix format * minor : style --------- Co-authored-by: liyuhang <yuhang.li@zhipuai.cn> Co-authored-by: piDack <pcdack@hotmail.co> Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: liyuhang <yuhang.li@aminer.cn> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2025-02-02 09:48:46 +02:00
Georgi Gerganov	5d3491e789	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-31 15:11:11 +02:00
Olivier Chafik	8b576b6c55	Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639 ) --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>	2025-01-30 19:13:58 +00:00
Georgi Gerganov	a40ba49fa6	Merge branch 'master' into gg/llama-kv-cache	2025-01-30 16:39:58 +02:00
mgroeber9110	ffd0821c57	vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496 )	2025-01-30 12:10:59 +02:00
Georgi Gerganov	c30e34cdba	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-29 15:01:26 +02:00
Georgi Gerganov	918885697e	llama : resolve rwkv conflict ggml-ci	2025-01-29 14:45:04 +02:00
Molly Sophia	325afb370a	llama: fix missing k_cache store for rwkv6qwen2 (#11445 ) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2025-01-29 12:07:21 +08:00
lexasub	a5203b4465	llama : minor fixes for up llama load model speed (#11448 ) * impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30% * llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings * Update src/llama-vocab.cpp --------- Co-authored-by: lexasub <empty@empty.ru> Co-authored-by: Diego Devesa <slarengh@gmail.com>	2025-01-27 14:42:09 +01:00
Georgi Gerganov	e665b57fa2	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-01-27 14:09:22 +02:00
Johannes Gäßler	df984e0147	llama: refactor llama_decode_impl (#11381 )	2025-01-27 12:07:12 +01:00

1 2 3 4 5 ...

310 Commits