llama.cpp

Commit Graph

Author	SHA1	Message	Date
Georgi Gerganov	0f7daa9d1b	graph : move non-context related logic to llm_build_context ggml-ci	2025-02-28 20:36:25 +02:00
Georgi Gerganov	9cab53c7dd	cont : migrate the rest of the inputs out of llama_context ggml-ci	2025-02-28 18:01:25 +02:00
Georgi Gerganov	7f02ee562e	context : decouple inputs, llama_graph_i become const (WIP) ggml-ci	2025-02-28 16:30:41 +02:00
Georgi Gerganov	38db8a5861	llama : introduce concept of llama_memory ggml-ci	2025-02-28 10:51:17 +02:00
Georgi Gerganov	828effd9d7	kv-cache : basic abstraction ggml-ci	2025-02-27 16:00:29 +02:00
Georgi Gerganov	82675a0180	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-27 15:10:18 +02:00
Georgi Gerganov	952feedfca	context : disable encoder embd tensor for now ggml-ci	2025-02-27 15:07:10 +02:00
Georgi Gerganov	4efe989886	context : pass embeddings tensor from encoder to decoder ggml-ci	2025-02-25 16:11:17 +02:00
Georgi Gerganov	e2b3294f2c	context : fix enc-dec state save/load ggml-ci	2025-02-25 12:14:34 +02:00
Georgi Gerganov	e5bc5f8e02	context : enc-dec is now working ggml-ci	2025-02-25 12:10:34 +02:00
Vitali Lovich	3e9a2860e9	llama : expose llama_model_n_head_kv in the API (#11997 ) It's useful to be able to have this from the library layer as it's a key parameter of the model (e.g. to figure out how much KV cache memory is needed).	2025-02-25 11:29:33 +02:00
Georgi Gerganov	be58e30017	enc-dec : compose wip ggml-ci	2025-02-24 18:12:24 +02:00
Georgi Gerganov	9cd78f11a1	context : explicit llama_context_i abstract interface ggml-ci	2025-02-24 13:38:11 +02:00
Georgi Gerganov	4a1054b552	context : reuse built_attn_mha ggml-ci	2025-02-24 11:29:52 +02:00
Georgi Gerganov	a5a85a3bc0	context : fix recurrent reserve ggml-ci	2025-02-24 08:59:12 +02:00
Georgi Gerganov	0699a44c83	context : remove redundant virtual, protected -> private ggml-ci	2025-02-23 20:02:11 +02:00
Georgi Gerganov	6378112cb5	graph : remove the build_kv_... API from llama_graph_i ggml-ci	2025-02-23 19:39:22 +02:00
Georgi Gerganov	372fa3a894	cont : enc should work now, next is dec ggml-ci	2025-02-23 12:20:23 +02:00
Georgi Gerganov	f5e80208c5	wip enc-dec	2025-02-21 19:17:47 +02:00
Georgi Gerganov	c4c0a4d13c	Merge branch 'master' into gg/llama-kv-cache	2025-02-21 19:14:07 +02:00
Georgi Gerganov	51f311e057	llama : skip loading unused tensors (#12004 ) * llama : assign unknown/unused tensors to host buffer type ggml-ci * llama : skip unused tensors ggml-ci	2025-02-21 18:33:18 +02:00
Georgi Gerganov	3753b30d65	context : fix n_outputs init ggml-ci	2025-02-21 15:53:26 +02:00
Georgi Gerganov	f588a70da3	context : wrap input tensors in struct ggml-ci	2025-02-21 15:09:28 +02:00
Georgi Gerganov	ebf1bdf97b	context : add logs ggml-ci	2025-02-21 14:35:23 +02:00
Georgi Gerganov	548c230dff	graph : remove worst_case from the API ggml-ci	2025-02-21 13:29:25 +02:00
Georgi Gerganov	2645a7d9a9	context : add save/load for recurrent context ggml-ci	2025-02-21 10:28:42 +02:00
Georgi Gerganov	08011c2ca1	context : add llama_kv_cache_recurrent prototype ggml-ci	2025-02-20 20:55:13 +02:00
Georgi Gerganov	ad870c49f4	context : fix causal input for cache-less case ggml-ci	2025-02-20 20:01:02 +02:00
Georgi Gerganov	b1554be1d7	context : add cache-less llama_context ggml-ci	2025-02-20 18:30:04 +02:00
Georgi Gerganov	072280ea6b	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-20 14:26:43 +02:00
Georgi Gerganov	f95b04a21c	model : fix order kvq -> qkv ggml-ci	2025-02-19 18:52:20 +02:00
Georgi Gerganov	2eacb4c1bf	graph : simplify attention api ggml-ci	2025-02-19 18:43:49 +02:00
Georgi Gerganov	e17e4b72d1	context : add llama_context_recurrent ggml-ci	2025-02-19 16:07:27 +02:00
Georgi Gerganov	5f11a5502a	kv-cache : remove llama_kv_cache_i	2025-02-19 14:36:27 +02:00
Daniel Bevenius	9626d9351a	llama : fix indentation in llama-grammar [no ci] (#11943 ) This commit adjusts the indentation for the functions `parse_sequence` and `parse_rule` in src/llama-grammar.cpp. The motivation is consistency and improve readability.	2025-02-19 06:16:23 +01:00
Georgi Gerganov	f5cedbcaaa	kv-cache : prepare for abstraction ggml-ci	2025-02-18 21:28:58 +02:00
Georgi Gerganov	2bffc2d514	model : pass llama_graph_i as ptr ggml-ci	2025-02-18 14:57:26 +02:00
Georgi Gerganov	9e50456e19	context : minor simplify ggml-ci	2025-02-18 14:53:02 +02:00
Georgi Gerganov	befe14f06f	llama : reorder encode/decode in sources	2025-02-18 14:47:53 +02:00
Georgi Gerganov	bc6f187e9c	cont : use returend tensors from the graph build ggml-ci	2025-02-18 14:24:17 +02:00
Georgi Gerganov	172f61690c	cont : return important tensors ggml-ci	2025-02-18 13:48:43 +02:00
Georgi Gerganov	c23590319a	graph : add llama_graph_result ggml-ci	2025-02-18 13:48:21 +02:00
Georgi Gerganov	f0d3ff2388	Merge branch 'master' into gg/llama-kv-cache ggml-ci	2025-02-18 10:14:37 +02:00
Georgi Gerganov	68ff663a04	repo : update links to new url (#11886 ) * repo : update links to new url ggml-ci * cont : more urls ggml-ci	2025-02-15 16:40:57 +02:00
Georgi Gerganov	1d801d27b9	graph : update attn/kv_self names	2025-02-14 17:22:55 +02:00
Georgi Gerganov	828064564c	context : move common inputs to base class ggml-ci	2025-02-14 16:48:21 +02:00
Georgi Gerganov	d5e8e1a2ba	context : remove batch_manager ggml-ci	2025-02-14 16:10:55 +02:00
Georgi Gerganov	131743ff4f	context : abstract constructor and init ggml-ci	2025-02-13 17:17:51 +02:00
Georgi Gerganov	ed3cb55abe	context : abstract input ggml-ci	2025-02-13 15:53:15 +02:00
Georgi Gerganov	107d1e2c32	context : move output functionality to base class ggml-ci	2025-02-13 15:42:14 +02:00

1 2 3 4 5 ...

360 Commits