Commit Graph

47 Commits

Author SHA1 Message Date
Georgi Gerganov f5cedbcaaa
kv-cache : prepare for abstraction
ggml-ci
2025-02-18 21:28:58 +02:00
Georgi Gerganov 2bffc2d514
model : pass llama_graph_i as ptr
ggml-ci
2025-02-18 14:57:26 +02:00
Georgi Gerganov 9e50456e19
context : minor simplify
ggml-ci
2025-02-18 14:53:02 +02:00
Georgi Gerganov befe14f06f
llama : reorder encode/decode in sources 2025-02-18 14:47:53 +02:00
Georgi Gerganov bc6f187e9c
cont : use returend tensors from the graph build
ggml-ci
2025-02-18 14:24:17 +02:00
Georgi Gerganov 172f61690c
cont : return important tensors
ggml-ci
2025-02-18 13:48:43 +02:00
Georgi Gerganov c23590319a
graph : add llama_graph_result
ggml-ci
2025-02-18 13:48:21 +02:00
Georgi Gerganov 1d801d27b9
graph : update attn/kv_self names 2025-02-14 17:22:55 +02:00
Georgi Gerganov 828064564c
context : move common inputs to base class
ggml-ci
2025-02-14 16:48:21 +02:00
Georgi Gerganov d5e8e1a2ba
context : remove batch_manager
ggml-ci
2025-02-14 16:10:55 +02:00
Georgi Gerganov 131743ff4f
context : abstract constructor and init
ggml-ci
2025-02-13 17:17:51 +02:00
Georgi Gerganov ed3cb55abe
context : abstract input
ggml-ci
2025-02-13 15:53:15 +02:00
Georgi Gerganov 107d1e2c32
context : move output functionality to base class
ggml-ci
2025-02-13 15:42:14 +02:00
Georgi Gerganov e08f38df69
context : minor cleanup
ggml-ci
2025-02-13 12:50:53 +02:00
Georgi Gerganov f7c7757bab
context : abstract state read/write
ggml-ci
2025-02-13 12:37:28 +02:00
Georgi Gerganov 3a504d9a0b
llama : introduce llama_io interfaces
ggml-ci
2025-02-13 12:25:54 +02:00
Georgi Gerganov fbe6a07256
context : rename to llama_context_kv_self 2025-02-12 17:16:44 +02:00
Georgi Gerganov 6ee86e5e0f
graph : restore ubatch in build_cb
ggml-ci
2025-02-12 16:29:15 +02:00
Georgi Gerganov f63aeecce6
llama : models now build their graphs using llama_graph_i
ggml-ci
2025-02-12 15:08:40 +02:00
Georgi Gerganov 5eae8e5183
context : move build_rope_factors to base class
ggml-ci
2025-02-12 13:32:02 +02:00
Georgi Gerganov d146a14f77
context : minor naming fix 2025-02-12 12:41:36 +02:00
Georgi Gerganov 8da7f612b7
context : improve llama_context encapsulation
ggml-ci
2025-02-12 12:15:04 +02:00
Georgi Gerganov b52b79b048
context : move encode/decode to llama-context.cpp 2025-02-12 11:23:38 +02:00
Georgi Gerganov 02ef4be975
context : initial abstraction
ggml-ci
2025-02-11 22:27:21 +02:00
Georgi Gerganov 2cd8a903c8
context : make output functions members
ggml-ci
2025-02-10 17:01:27 +02:00
Georgi Gerganov d1d8d53008
bman : remove ubatch member
ggml-ci
2025-02-10 16:50:14 +02:00
Georgi Gerganov ef358ee78f
context : add decode/encode
ggml-ci
2025-02-10 16:14:13 +02:00
Georgi Gerganov 972f91c7d7
Merge branch 'master' into gg/llama-kv-cache
ggml-ci
2025-02-10 14:45:54 +02:00
Georgi Gerganov b15fede7a9
kv-cache : fix defrag condition
ggml-ci
2025-02-06 14:35:19 +02:00
Molly Sophia 1eca8916b5
llama : fix rwkv inference (#11618)
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2025-02-03 14:17:50 +02:00
Georgi Gerganov 3e23be7911
context : store graph build function callback
ggml-ci
2025-02-02 10:49:32 +02:00
Georgi Gerganov e665b57fa2
Merge branch 'master' into gg/llama-kv-cache
ggml-ci
2025-01-27 14:09:22 +02:00
Georgi Gerganov a0c500b4dc
context : prepare for abstraction
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov 99422dfa3f
context : introduce llama_batch_manager
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov 133ad6a723
context : initial need_reserve logic
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov c75ba6851e
context : move adapter code in the implementation [no ci] 2025-01-26 20:16:22 +02:00
Georgi Gerganov f0713498fd
context : add get_ctx_padding()
ggml-ci
2025-01-26 20:16:22 +02:00
Georgi Gerganov b4ec1d4429
cont : move kv_self update to llama_context
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov f2524c0e41
llama : remove references to llama_kv_cache (wip)
Intermediate step necessary to abstract the `llama_context` and
`llama_kv_cache`.

ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov a19f671fe0
context : minor
ggml-ci
2025-01-26 20:16:21 +02:00
Georgi Gerganov 17b363afd3
llama : update llama_kv_self API
ggml-ci
2025-01-26 20:16:20 +02:00
Georgi Gerganov fd05ab87aa
kv_cache : move state read/write to llama_kv_cache
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov 4cd1b6fa4c
context : prepare kv_cache_read/write to be moved to kv_cache
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov 4d7bd03e65
kv_cache : functions -> members
ggml-ci
2025-01-26 20:14:36 +02:00
Georgi Gerganov f78b396ee7
llama : add struct llama_kv_cache (wip) [no ci] 2025-01-26 20:12:06 +02:00
Georgi Gerganov afa8a9ec9b
llama : add `llama_vocab`, functions -> methods, naming (#11110)
* llama : functions -> methods (#11110)

* llama : add struct llama_vocab to the API (#11156)

ggml-ci

* hparams : move vocab params to llama_vocab (#11159)

ggml-ci

* vocab : more pimpl (#11165)

ggml-ci

* vocab : minor tokenization optimizations (#11160)

ggml-ci

Co-authored-by: Diego Devesa <slarengh@gmail.com>

* lora : update API names (#11167)

ggml-ci

* llama : update API names to use correct prefix (#11174)

* llama : update API names to use correct prefix

ggml-ci

* cont

ggml-ci

* cont

ggml-ci

* minor [no ci]

* vocab : llama_vocab_add_[be]os -> llama_vocab_get_add_[be]os (#11174)

ggml-ci

* vocab : llama_vocab_n_vocab -> llama_vocab_n_tokens (#11174)

ggml-ci

---------

Co-authored-by: Diego Devesa <slarengh@gmail.com>
2025-01-12 11:32:42 +02:00
Georgi Gerganov f66f582927
llama : refactor `src/llama.cpp` (#10902)
* llama : scatter llama.cpp into multiple modules (wip)

* llama : control-vector -> adapter

* llama : arch

* llama : mmap

ggml-ci

* ci : remove BUILD_SHARED_LIBS=OFF

ggml-ci

* llama : arch (cont)

ggml-ci

* llama : chat

ggml-ci

* llama : model

ggml-ci

* llama : hparams

ggml-ci

* llama : adapter

ggml-ci

* examples : fix

ggml-ci

* rebase

ggml-ci

* minor

* llama : kv cache

ggml-ci

* llama : impl

ggml-ci

* llama : batch

ggml-ci

* cont

ggml-ci

* llama : context

ggml-ci

* minor

* llama : context (cont)

ggml-ci

* llama : model loader

ggml-ci

* common : update lora

ggml-ci

* llama : quant

ggml-ci

* llama : quant (cont)

ggml-ci

* minor [no ci]
2025-01-03 10:18:53 +02:00