llama.cpp

History

Daniel Bevenius ca0ef2dddb llama : clarify comment about pp and tg graphs [no ci] (#14895 ) * llama : clarify comment about pp and tg graphs [no ci] This commit clarifies the comment in `llama-context.cpp` regarding the prefill prompt (pp), and token generation (tg) graphs. The motivation for this is that I've struggled to remember these and had to look them up more than once, so I thought it would be helpful to add a comment that makes it clear what these stand for. * squash! llama : clarify comment about pp and tg graphs [no ci] Change "pp" to "prompt processing".		2025-07-27 12:10:51 +02:00
..
CMakeLists.txt	memory : Hybrid recurrent cache (#13979 )	2025-06-19 08:08:14 +03:00
llama-adapter.cpp	llama : do not crash if there is no CPU backend (#13395 )	2025-05-09 13:02:07 +02:00
llama-adapter.h	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-arch.cpp	chat : fix kimi-k2 chat template (#14852 )	2025-07-24 13:59:56 +02:00
llama-arch.h	model : add EXAONE 4.0 support (#14630 )	2025-07-18 10:45:49 +02:00
llama-batch.cpp	llama : reuse compute graphs (#14482 )	2025-07-17 19:08:33 +03:00
llama-batch.h	llama : reuse compute graphs (#14482 )	2025-07-17 19:08:33 +03:00
llama-chat.cpp	chat : fix kimi-k2 chat template (#14852 )	2025-07-24 13:59:56 +02:00
llama-chat.h	model : add EXAONE 4.0 support (#14630 )	2025-07-18 10:45:49 +02:00
llama-context.cpp	llama : clarify comment about pp and tg graphs [no ci] (#14895 )	2025-07-27 12:10:51 +02:00
llama-context.h	context : restore preemptive sched reset when LLAMA_SET_ROWS=0 (#14870 )	2025-07-25 14:28:06 +03:00
llama-cparams.cpp	cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188 )	2025-06-15 10:08:58 +03:00
llama-cparams.h	llama : add high-throughput mode (#14363 )	2025-07-16 16:35:42 +03:00
llama-grammar.cpp	`server`: streaming of tool calls and thoughts when `--jinja` is on (#12379 )	2025-05-25 01:48:08 +01:00
llama-grammar.h	`tool-call`: fix Qwen 2.5 Coder support, add micro benchmarks, support trigger patterns for lazy grammars (#12034 )	2025-03-05 13:05:13 +00:00
llama-graph.cpp	metal : fuse add, mul + add tests (#14596 )	2025-07-18 20:37:26 +03:00
llama-graph.h	graph : refactor context to not pass gf explicitly (#14629 )	2025-07-18 08:29:28 +03:00
llama-hparams.cpp	llama : add high-throughput mode (#14363 )	2025-07-16 16:35:42 +03:00
llama-hparams.h	model : make rope_yarn_log_mul optional for deepseek2 (#14896 )	2025-07-27 11:18:37 +03:00
llama-impl.cpp	GGUF: C++ refactor, backend support, misc fixes (#11030 )	2025-01-07 18:01:58 +01:00
llama-impl.h	cleanup: fix compile warnings associated with gnu_printf (#11811 )	2025-02-12 10:06:53 -04:00
llama-io.cpp	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-io.h	llama : refactor llama_context, llama_kv_cache, llm_build_context (#12181 )	2025-03-13 12:35:44 +02:00
llama-kv-cache-unified-iswa.cpp	llama : add high-throughput mode (#14363 )	2025-07-16 16:35:42 +03:00
llama-kv-cache-unified-iswa.h	llama : add high-throughput mode (#14363 )	2025-07-16 16:35:42 +03:00
llama-kv-cache-unified.cpp	kv-cache : fix k-shift for multiple streams (#14742 )	2025-07-17 20:52:33 +03:00
llama-kv-cache-unified.h	llama : reuse compute graphs (#14482 )	2025-07-17 19:08:33 +03:00
llama-kv-cells.h	kv-cache : use ggml_set_rows (#14285 )	2025-07-03 10:53:35 +03:00
llama-memory-hybrid.cpp	llama : fix parameter order for hybrid memory initialization (#14725 )	2025-07-16 21:17:25 +02:00
llama-memory-hybrid.h	kv-cache : use ggml_set_rows (#14285 )	2025-07-03 10:53:35 +03:00
llama-memory-recurrent.cpp	memory : handle saving/loading null layers in recurrent memory (#14675 )	2025-07-23 11:16:41 +03:00
llama-memory-recurrent.h	memory : rename interface to llama_memory_context_i (#14296 )	2025-06-21 08:03:46 +03:00
llama-memory.cpp	memory : correctly handle failure in apply() (#14438 )	2025-06-30 18:03:03 +03:00
llama-memory.h	memory : correctly handle failure in apply() (#14438 )	2025-06-30 18:03:03 +03:00
llama-mmap.cpp	llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013 )	2025-06-05 11:57:42 +02:00
llama-mmap.h	llama-mmap: fix missing include (#11796 )	2025-02-10 20:58:18 +02:00
llama-model-loader.cpp	llama : support multiple classifier outputs and labels (#13940 )	2025-06-06 09:03:25 +02:00
llama-model-loader.h	llama : add option to override model tensor buffers (#11397 )	2025-04-02 14:52:01 +02:00
llama-model-saver.cpp	llama : improve sep token handling (#14272 )	2025-06-20 14:04:09 +02:00
llama-model-saver.h	llama/ggml: add LLM training support (#10544 )	2025-05-12 14:44:49 +02:00
llama-model.cpp	model : make rope_yarn_log_mul optional for deepseek2 (#14896 )	2025-07-27 11:18:37 +03:00
llama-model.h	model: add Ernie 4.5 MoE support (#14658 )	2025-07-17 23:15:32 +02:00
llama-quant.cpp	quantize : fix minor logic flaw in --tensor-type (#14572 )	2025-07-13 18:02:17 +02:00
llama-quant.h	llama : refactor `src/llama.cpp` (#10902 )	2025-01-03 10:18:53 +02:00
llama-sampling.cpp	sampling : make sure samplers return at least 1 token (#13822 )	2025-05-27 12:07:52 +03:00
llama-sampling.h	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
llama-vocab.cpp	model : add EXAONE 4.0 support (#14630 )	2025-07-18 10:45:49 +02:00
llama-vocab.h	Support diffusion models: Add Dream 7B (#14644 )	2025-07-16 20:03:51 +08:00
llama.cpp	llama : add thread safety test (#14035 )	2025-06-16 08:11:43 -07:00
unicode-data.cpp	server : better security control for public deployments (#9776 )	2024-10-08 13:27:04 +02:00
unicode-data.h	llama : reduce compile time and binary size (#9712 )	2024-10-02 15:49:55 +02:00
unicode.cpp	model : add Kimi-K2 support (#14654 )	2025-07-15 21:54:22 +02:00
unicode.h	model : add Kimi-K2 support (#14654 )	2025-07-15 21:54:22 +02:00