Commit Graph

360 Commits

Author SHA1 Message Date
Georgi Gerganov 0f7daa9d1b
graph : move non-context related logic to llm_build_context
ggml-ci
2025-02-28 20:36:25 +02:00
Georgi Gerganov 9cab53c7dd
cont : migrate the rest of the inputs out of llama_context
ggml-ci
2025-02-28 18:01:25 +02:00
Georgi Gerganov 7f02ee562e
context : decouple inputs, llama_graph_i become const (WIP)
ggml-ci
2025-02-28 16:30:41 +02:00
Georgi Gerganov 38db8a5861
llama : introduce concept of llama_memory
ggml-ci
2025-02-28 10:51:17 +02:00
Georgi Gerganov 828effd9d7
kv-cache : basic abstraction
ggml-ci
2025-02-27 16:00:29 +02:00
Georgi Gerganov 82675a0180
Merge branch 'master' into gg/llama-kv-cache
ggml-ci
2025-02-27 15:10:18 +02:00
Georgi Gerganov 952feedfca
context : disable encoder embd tensor for now
ggml-ci
2025-02-27 15:07:10 +02:00
Georgi Gerganov 4efe989886
context : pass embeddings tensor from encoder to decoder
ggml-ci
2025-02-25 16:11:17 +02:00
Georgi Gerganov e2b3294f2c
context : fix enc-dec state save/load
ggml-ci
2025-02-25 12:14:34 +02:00
Georgi Gerganov e5bc5f8e02
context : enc-dec is now working
ggml-ci
2025-02-25 12:10:34 +02:00
Vitali Lovich 3e9a2860e9
llama : expose llama_model_n_head_kv in the API (#11997)
It's useful to be able to have this from the library layer as it's a key
parameter of the model (e.g. to figure out how much KV cache memory is
needed).
2025-02-25 11:29:33 +02:00
Georgi Gerganov be58e30017
enc-dec : compose wip
ggml-ci
2025-02-24 18:12:24 +02:00
Georgi Gerganov 9cd78f11a1
context : explicit llama_context_i abstract interface
ggml-ci
2025-02-24 13:38:11 +02:00
Georgi Gerganov 4a1054b552
context : reuse built_attn_mha
ggml-ci
2025-02-24 11:29:52 +02:00
Georgi Gerganov a5a85a3bc0
context : fix recurrent reserve
ggml-ci
2025-02-24 08:59:12 +02:00
Georgi Gerganov 0699a44c83
context : remove redundant virtual, protected -> private
ggml-ci
2025-02-23 20:02:11 +02:00
Georgi Gerganov 6378112cb5
graph : remove the build_kv_... API from llama_graph_i
ggml-ci
2025-02-23 19:39:22 +02:00
Georgi Gerganov 372fa3a894
cont : enc should work now, next is dec
ggml-ci
2025-02-23 12:20:23 +02:00
Georgi Gerganov f5e80208c5
wip enc-dec 2025-02-21 19:17:47 +02:00
Georgi Gerganov c4c0a4d13c
Merge branch 'master' into gg/llama-kv-cache 2025-02-21 19:14:07 +02:00
Georgi Gerganov 51f311e057
llama : skip loading unused tensors (#12004)
* llama : assign unknown/unused tensors to host buffer type

ggml-ci

* llama : skip unused tensors

ggml-ci
2025-02-21 18:33:18 +02:00
Georgi Gerganov 3753b30d65
context : fix n_outputs init
ggml-ci
2025-02-21 15:53:26 +02:00
Georgi Gerganov f588a70da3
context : wrap input tensors in struct
ggml-ci
2025-02-21 15:09:28 +02:00
Georgi Gerganov ebf1bdf97b
context : add logs
ggml-ci
2025-02-21 14:35:23 +02:00
Georgi Gerganov 548c230dff
graph : remove worst_case from the API
ggml-ci
2025-02-21 13:29:25 +02:00
Georgi Gerganov 2645a7d9a9
context : add save/load for recurrent context
ggml-ci
2025-02-21 10:28:42 +02:00
Georgi Gerganov 08011c2ca1
context : add llama_kv_cache_recurrent prototype
ggml-ci
2025-02-20 20:55:13 +02:00
Georgi Gerganov ad870c49f4
context : fix causal input for cache-less case
ggml-ci
2025-02-20 20:01:02 +02:00
Georgi Gerganov b1554be1d7
context : add cache-less llama_context
ggml-ci
2025-02-20 18:30:04 +02:00
Georgi Gerganov 072280ea6b
Merge branch 'master' into gg/llama-kv-cache
ggml-ci
2025-02-20 14:26:43 +02:00
Georgi Gerganov f95b04a21c
model : fix order kvq -> qkv
ggml-ci
2025-02-19 18:52:20 +02:00
Georgi Gerganov 2eacb4c1bf
graph : simplify attention api
ggml-ci
2025-02-19 18:43:49 +02:00
Georgi Gerganov e17e4b72d1
context : add llama_context_recurrent
ggml-ci
2025-02-19 16:07:27 +02:00
Georgi Gerganov 5f11a5502a
kv-cache : remove llama_kv_cache_i 2025-02-19 14:36:27 +02:00
Daniel Bevenius 9626d9351a
llama : fix indentation in llama-grammar [no ci] (#11943)
This commit adjusts the indentation for the functions `parse_sequence`
and `parse_rule` in src/llama-grammar.cpp.

The motivation is consistency and improve readability.
2025-02-19 06:16:23 +01:00
Georgi Gerganov f5cedbcaaa
kv-cache : prepare for abstraction
ggml-ci
2025-02-18 21:28:58 +02:00
Georgi Gerganov 2bffc2d514
model : pass llama_graph_i as ptr
ggml-ci
2025-02-18 14:57:26 +02:00
Georgi Gerganov 9e50456e19
context : minor simplify
ggml-ci
2025-02-18 14:53:02 +02:00
Georgi Gerganov befe14f06f
llama : reorder encode/decode in sources 2025-02-18 14:47:53 +02:00
Georgi Gerganov bc6f187e9c
cont : use returend tensors from the graph build
ggml-ci
2025-02-18 14:24:17 +02:00
Georgi Gerganov 172f61690c
cont : return important tensors
ggml-ci
2025-02-18 13:48:43 +02:00
Georgi Gerganov c23590319a
graph : add llama_graph_result
ggml-ci
2025-02-18 13:48:21 +02:00
Georgi Gerganov f0d3ff2388
Merge branch 'master' into gg/llama-kv-cache
ggml-ci
2025-02-18 10:14:37 +02:00
Georgi Gerganov 68ff663a04
repo : update links to new url (#11886)
* repo : update links to new url

ggml-ci

* cont : more urls

ggml-ci
2025-02-15 16:40:57 +02:00
Georgi Gerganov 1d801d27b9
graph : update attn/kv_self names 2025-02-14 17:22:55 +02:00
Georgi Gerganov 828064564c
context : move common inputs to base class
ggml-ci
2025-02-14 16:48:21 +02:00
Georgi Gerganov d5e8e1a2ba
context : remove batch_manager
ggml-ci
2025-02-14 16:10:55 +02:00
Georgi Gerganov 131743ff4f
context : abstract constructor and init
ggml-ci
2025-02-13 17:17:51 +02:00
Georgi Gerganov ed3cb55abe
context : abstract input
ggml-ci
2025-02-13 15:53:15 +02:00
Georgi Gerganov 107d1e2c32
context : move output functionality to base class
ggml-ci
2025-02-13 15:42:14 +02:00