gemma.cpp

History

Charles Zhao f8131339a7 Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching. PiperOrigin-RevId: 829104661		2025-11-06 14:20:17 -08:00
..
bindings	Replace mt19937 with new generator to enable parallel sampling	2025-09-04 23:49:10 -07:00
evals	Add MMLU eval to github	2024-05-20 10:20:53 -07:00
activations.h	Introduce attention implementation configurability.	2025-11-06 08:43:41 -08:00
api_client.cc	feature: add API server and client with Google protocol	2025-08-21 11:32:48 +09:00
api_server.cc	tune pool kSpin mode in threading_context	2025-10-07 08:36:26 -07:00
attention.cc	Minor: ParallelismStrategy->Parallelism	2025-11-06 06:56:10 -08:00
attention.h	Also update attention.h to type-erased query_norm_scale	2025-10-28 06:48:33 -07:00
configs.cc	Add config flag for global timescale & rely on config to deduce wrapping	2025-10-24 06:54:56 -07:00
configs.h	Introduce attention implementation configurability.	2025-11-06 08:43:41 -08:00
configs_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
flash_attention.cc	Minor: ParallelismStrategy->Parallelism	2025-11-06 06:56:10 -08:00
flash_attention.h	Pre-compress query activations to BF16 before FlashAttention.	2025-10-31 09:49:44 -07:00
flash_attention_test.cc	Introduce attention implementation configurability.	2025-11-06 08:43:41 -08:00
flash_structs.h	Added access to flash attention internals to TileFlashAttention4	2025-10-30 06:50:05 -07:00
gemma-inl.h	Minor: ParallelismStrategy->Parallelism	2025-11-06 06:56:10 -08:00
gemma.cc	Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching.	2025-11-06 14:20:17 -08:00
gemma.h	[Gemma.cpp] Allows non-owned arguments for attention methods.	2025-10-27 10:43:25 -07:00
gemma_args.h	Introduce attention implementation configurability.	2025-11-06 08:43:41 -08:00
kv_cache.cc	[Gemma.cpp] Allows non-owned arguments for attention methods.	2025-10-27 10:43:25 -07:00
kv_cache.h	[Gemma.cpp] Allows non-owned arguments for attention methods.	2025-10-27 10:43:25 -07:00
model_store.cc	Add 8-bit integer quantization (I8Stream) to Gemma.cpp.	2025-10-15 09:25:20 -07:00
model_store.h	Major cleanup of profiler zones, add Caller annotation for all pool.Run	2025-10-23 01:54:24 -07:00
run.cc	Replace mt19937 with new generator to enable parallel sampling	2025-09-04 23:49:10 -07:00
tensor_info.cc	Remove Griffin support	2025-09-05 02:35:40 -07:00
tensor_info.h	Add 8-bit integer quantization (I8Stream) to Gemma.cpp.	2025-10-15 09:25:20 -07:00
tensor_info_test.cc	Minor: ModelWeightsPtrs -> WeightsPtrs	2025-07-11 06:11:51 -07:00
tokenizer.cc	(Resubmit) Prepare profiler annotations for new API	2025-08-13 01:38:24 -07:00
tokenizer.h	6x large-batch, short-prompt prefill speedup	2025-06-10 09:56:20 -07:00
vit.cc	Major cleanup of profiler zones, add Caller annotation for all pool.Run	2025-10-23 01:54:24 -07:00
vit.h	Minor: ModelWeightsPtrs -> WeightsPtrs	2025-07-11 06:11:51 -07:00
weights.cc	Minor: ParallelismStrategy->Parallelism	2025-11-06 06:56:10 -08:00
weights.h	Major cleanup of profiler zones, add Caller annotation for all pool.Run	2025-10-23 01:54:24 -07:00