gemma.cpp

History

Jan Wassenberg d1638587f0 1.14x batch decode speedup: parallelize RMSNorm ops Activations was over-parallelized, use single pool instead. Also improve profiler zone annotations, pass through worker args (for tracking concurrency), now non-optional. PiperOrigin-RevId: 788790976		2025-07-30 00:55:45 -07:00
..
bindings	Rename GetModelConfig->Config	2025-07-29 10:18:12 -07:00
evals	Add MMLU eval to github	2024-05-20 10:20:53 -07:00
activations.h	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
attention.cc	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
attention.h	Back to f32 kv_cache, but via typedef	2025-07-21 07:05:35 -07:00
configs.cc	Internal change.	2025-07-29 08:21:29 -07:00
configs.h	Add blob_path to config deduction message	2025-07-11 18:58:56 -07:00
configs_test.cc	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
gemma-inl.h	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
gemma.cc	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
gemma.h	Rename GetModelConfig->Config	2025-07-29 10:18:12 -07:00
gemma_args.h	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
griffin.cc	Speed up builds by skipping rarely used targets	2025-06-17 05:44:20 -07:00
griffin.h	Major refactor: clarify query_idx (global) vs qi. Refs #607	2025-06-16 02:42:02 -07:00
kv_cache.cc	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
kv_cache.h	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
model_store.cc	Add blob_path to config deduction message	2025-07-11 18:58:56 -07:00
model_store.h	Remove backprop/	2025-05-28 07:01:17 -07:00
run.cc	Rename GetModelConfig->Config	2025-07-29 10:18:12 -07:00
tensor_info.cc	Major refactor to de-templatize gemma-inl and weights	2025-06-02 23:01:35 -07:00
tensor_info.h	Minor: rename compression/shared -> types.h	2025-05-13 06:53:21 -07:00
tensor_info_test.cc	Minor: ModelWeightsPtrs -> WeightsPtrs	2025-07-11 06:11:51 -07:00
tokenizer.cc	Huge refactor of weight handling and model loading.	2025-05-06 04:44:21 -07:00
tokenizer.h	6x large-batch, short-prompt prefill speedup	2025-06-10 09:56:20 -07:00
vit.cc	1.14x batch decode speedup: parallelize RMSNorm ops	2025-07-30 00:55:45 -07:00
vit.h	Minor: ModelWeightsPtrs -> WeightsPtrs	2025-07-11 06:11:51 -07:00
weights.cc	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00
weights.h	De-singleton ThreadingContext so callers can pass in their own	2025-07-22 02:08:46 -07:00