gemma.cpp/gemma
Charles Zhao f8131339a7 Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching.
PiperOrigin-RevId: 829104661
2025-11-06 14:20:17 -08:00
..
bindings Replace mt19937 with new generator to enable parallel sampling 2025-09-04 23:49:10 -07:00
evals Add MMLU eval to github 2024-05-20 10:20:53 -07:00
activations.h Introduce attention implementation configurability. 2025-11-06 08:43:41 -08:00
api_client.cc feature: add API server and client with Google protocol 2025-08-21 11:32:48 +09:00
api_server.cc tune pool kSpin mode in threading_context 2025-10-07 08:36:26 -07:00
attention.cc Minor: ParallelismStrategy->Parallelism 2025-11-06 06:56:10 -08:00
attention.h Also update attention.h to type-erased query_norm_scale 2025-10-28 06:48:33 -07:00
configs.cc Add config flag for global timescale & rely on config to deduce wrapping 2025-10-24 06:54:56 -07:00
configs.h Introduce attention implementation configurability. 2025-11-06 08:43:41 -08:00
configs_test.cc Minor: rename compression/shared -> types.h 2025-05-13 06:53:21 -07:00
flash_attention.cc Minor: ParallelismStrategy->Parallelism 2025-11-06 06:56:10 -08:00
flash_attention.h Pre-compress query activations to BF16 before FlashAttention. 2025-10-31 09:49:44 -07:00
flash_attention_test.cc Introduce attention implementation configurability. 2025-11-06 08:43:41 -08:00
flash_structs.h Added access to flash attention internals to TileFlashAttention4 2025-10-30 06:50:05 -07:00
gemma-inl.h Minor: ParallelismStrategy->Parallelism 2025-11-06 06:56:10 -08:00
gemma.cc Refactor for continous batching. This cl does not change the current behavior of the code. It only extract two functions that will later be called for adding continuous batching. 2025-11-06 14:20:17 -08:00
gemma.h [Gemma.cpp] Allows non-owned arguments for attention methods. 2025-10-27 10:43:25 -07:00
gemma_args.h Introduce attention implementation configurability. 2025-11-06 08:43:41 -08:00
kv_cache.cc [Gemma.cpp] Allows non-owned arguments for attention methods. 2025-10-27 10:43:25 -07:00
kv_cache.h [Gemma.cpp] Allows non-owned arguments for attention methods. 2025-10-27 10:43:25 -07:00
model_store.cc Add 8-bit integer quantization (I8Stream) to Gemma.cpp. 2025-10-15 09:25:20 -07:00
model_store.h Major cleanup of profiler zones, add Caller annotation for all pool.Run 2025-10-23 01:54:24 -07:00
run.cc Replace mt19937 with new generator to enable parallel sampling 2025-09-04 23:49:10 -07:00
tensor_info.cc Remove Griffin support 2025-09-05 02:35:40 -07:00
tensor_info.h Add 8-bit integer quantization (I8Stream) to Gemma.cpp. 2025-10-15 09:25:20 -07:00
tensor_info_test.cc Minor: ModelWeightsPtrs -> WeightsPtrs 2025-07-11 06:11:51 -07:00
tokenizer.cc (Resubmit) Prepare profiler annotations for new API 2025-08-13 01:38:24 -07:00
tokenizer.h 6x large-batch, short-prompt prefill speedup 2025-06-10 09:56:20 -07:00
vit.cc Major cleanup of profiler zones, add Caller annotation for all pool.Run 2025-10-23 01:54:24 -07:00
vit.h Minor: ModelWeightsPtrs -> WeightsPtrs 2025-07-11 06:11:51 -07:00
weights.cc Minor: ParallelismStrategy->Parallelism 2025-11-06 06:56:10 -08:00
weights.h Major cleanup of profiler zones, add Caller annotation for all pool.Run 2025-10-23 01:54:24 -07:00