gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	9b6ed1a58f	gemma_batch_bench: generate more unique prompts PiperOrigin-RevId: 819944137	2025-10-15 15:46:05 -07:00
Jan Wassenberg	f3bc1c17da	1.03x speedup: fused FFN matmul-inl: support CView=StridedView or RowPtrs; rename to C_MC_NC matmul.cc: Allow 1 more rep for MC/NC to allow half-sized tiles, which helps. PiperOrigin-RevId: 807291701	2025-09-15 10:26:37 -07:00
Jan Wassenberg	ba6131311a	Fix gemma_batch_bench for flash attention q_T rows do not change. Also repeat prefill to reflect perf after autotuning. PiperOrigin-RevId: 805319377	2025-09-10 05:32:34 -07:00
Jan Wassenberg	461a9c7d1b	Matmul refactoring towards fusion MMLoops: move dispatch code out, use overloads split build target into matmul_env (for MatMulEnv/MMOptions) weights: no longer call BindB Fix potential out of bounds in gemma_batch_bench PiperOrigin-RevId: 804895985	2025-09-09 07:13:38 -07:00
Jan Wassenberg	7630ec0c92	batch_bench tweak: more output PiperOrigin-RevId: 773670580	2025-06-20 06:09:18 -07:00
Jan Wassenberg	ec02726cf7	6x large-batch, short-prompt prefill speedup Parallelize over queries instead of tokens introduce non_eos so we only iterate over not yet EOS queries; remove TokenStreamer. move RMSNormInplaceBatched out of Transformer to call the latter from prefill Consistent arg order. Fix gemma_test EOS handling which (caught by msan), remove from tokenizer.h Also add output to gemma_batch_bench, fix name PiperOrigin-RevId: 769676106	2025-06-10 09:56:20 -07:00
Jan Wassenberg	9efdcfd45c	1.07x batch decode speedup: more BF16 weights and activations BF16 att_sums and ffw_out Support BF16 B views without decompression Support arbitrary types in MulByConstAndAdd, AddFrom Also update profiler annotations in ops-inl.h PiperOrigin-RevId: 766995010	2025-06-03 23:30:18 -07:00
Jan Wassenberg	c4a75abe43	Cleanup gemma_batch_bench PiperOrigin-RevId: 766177406	2025-06-02 07:04:36 -07:00
Jan Wassenberg	e890d46f30	1.31x batch prefill, 1.24x batch decode speedup: NUMA binding Only the weights; binding MatMul output worsens batch=1 prefill. Update gemma_batch_bench to use --decode_qbatch. Fix/remove prefill_activations in gemma-inl.h. Refactor: use BasePageBytes directly when binding Move BindB/C to .cc by de-templatizing Remove MatOwners::AllocateFor because it is weights-specific (binding or not) Disband MatOwners, replace with vector PiperOrigin-RevId: 759610477	2025-05-16 07:42:13 -07:00
Jan Wassenberg	8d0882b966	Huge refactor of weight handling and model loading. Weight handling: - new ModelStore2 supports both pre-2025 multi-file and single-file formats - simpler ForEachTensor with TensorArgs - tensors are constructed with their full suffixed name I/O: - support mmap and stride - Simplified SbsWriter, single insert(); add SbsReader Misc: - kMockTokenizer: allow creating with unavailable tokenizer - configs.h: Simpler enum validity checks via kSentinel - matmul.h: remove unused enable_bind (now in allocator.h) - tensor_info: single TensorInfoRegistry class, rename from tensor_index.h Frontends: - Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&) - Deduce model/weight type, remove --model and parsing - Replace most common.h includes with configs.h - Remove --compressed_weights, use --weights instead - Remove ModelInfo, replaced by ModelConfig. Backprop: - Reduce max loss, remove backward_scalar_test (timeout) - Update thresholds because new RandInit changes rng eval order and thus numerics PiperOrigin-RevId: 755317484	2025-05-06 04:44:21 -07:00
Jan Wassenberg	160a5824fb	Cleanup: include fixes/comments, fix leak, vector reserve Also remove unused RowSpan configs.cc: Assign prompt wrapping to ModelConfig configs.h: simplify EnumValid via sentinel PiperOrigin-RevId: 750278497	2025-04-22 12:01:46 -07:00
Jan Wassenberg	8532da47f7	Major refactor of allocator/args: use new ThreadingContext2 instead of monostate/init in each frontend Add ThreadingArgs(replaces AppArgs) backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride compress_weights: remove, moving to py-only exporter instead Move MatPtr to mat.h and revise interface: - Generic MatOwner - rename accessors to Packed* - support stride/row accessors, fix RowPtr stride Add TypeBits(Type) Move GenerateMat to test_util-inl for sharing between matmul test/bench Move internal init to gemma.cc to avoid duplication Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage Remove --compressed_weights, use --weights instead. tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img Allocator: use normal unique_ptr for AllocBytes so users can call directly threading: use -> because AlignedPtr no longer assumes arrays PiperOrigin-RevId: 745918637	2025-04-10 01:29:54 -07:00
Stanko Novakovic	109a4d9f85	Add a simple benchmark for batching. This is a simple Gemma benchmark with a fixed batch size of 32. PiperOrigin-RevId: 698843573	2024-11-21 10:59:49 -08:00

13 Commits