gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Krzysztof Rymski	60eed010ba	Internal changes PiperOrigin-RevId: 862680527	2026-01-29 04:48:29 -08:00
Krzysztof Rymski	16a7ba2d6e	Internal changes PiperOrigin-RevId: 854171429	2026-01-09 06:35:36 -08:00
Jan Wassenberg	6d43d6ee19	Build fix for Arm SVE (invalid template arg on op) PiperOrigin-RevId: 854110884	2026-01-09 02:56:03 -08:00
The gemma.cpp Authors	95592a574e	Build fix for Arm SVE (explicit namespace qualification) PiperOrigin-RevId: 853864585	2026-01-08 13:29:45 -08:00
Jan Wassenberg	42e9cf557d	Internal change / remove unused PrintSpeed PiperOrigin-RevId: 853694463	2026-01-08 05:26:31 -08:00
Balazs Racz	384c390181	Allow overriding hardcoded max_seq_len by cmdline argument seq_len. Adds a SetMaxSeqLen method to ModelConfig to handle updating both max_seq_len and global attention window sizes. The Gemma constructor now checks if the provided inference seq_len exceeds the model's max_seq_len and, if so, emits a warning and updates the config. This prevents clipping context to the hard-coded maximum. PiperOrigin-RevId: 853676074	2026-01-08 04:28:59 -08:00
Jan Wassenberg	aeade052c6	Move AssertClose to test_util, add U16 PiperOrigin-RevId: 853321311	2026-01-07 10:33:20 -08:00
Krzysztof Rymski	2ee1fac74c	Internal changes PiperOrigin-RevId: 853138600	2026-01-07 01:21:37 -08:00
Jan Wassenberg	1605925d1e	Add int8 quantization stats Compute the L1 error and Shannon SNR (higher is better). PiperOrigin-RevId: 846832280	2025-12-19 12:43:03 -08:00
Copybara-Service	11aa16a13d	Merge pull request #810 from salmanmkc:upgrade-github-actions-node24 PiperOrigin-RevId: 846692015	2025-12-19 05:27:14 -08:00
Krzysztof Rymski	08a0760271	Internal changes PiperOrigin-RevId: 846663686	2025-12-19 03:43:15 -08:00
Krzysztof Rymski	b73a9ede8f	Internal changes PiperOrigin-RevId: 846648337	2025-12-19 02:46:18 -08:00
Balazs Racz	0ac55f71ed	Avoid using Row() for unaligned storage. PiperOrigin-RevId: 846214605	2025-12-18 05:10:57 -08:00
Krzysztof Rymski	6661d3a60c	Internal changes PiperOrigin-RevId: 846140314	2025-12-18 01:26:43 -08:00
Liam Miller-Cushon	142e6a7e9c	No public description PiperOrigin-RevId: 846030124	2025-12-17 20:10:54 -08:00
Phil Culliton	b8a409dbba	Use hn::Sub for vector subtraction in flash attention. PiperOrigin-RevId: 845883321	2025-12-17 12:57:34 -08:00
Balazs Racz	596bdfe5af	Separate monolithic gemma_lib library into more specific cc_library targets. Creates new cc_library targets for :attention, :tensor_stats and :activations. Eliminates cyclic dependencies between these libraries. PiperOrigin-RevId: 845690136	2025-12-17 03:31:16 -08:00
Salman Chishti	a4c78d4454	Merge branch 'dev' into upgrade-github-actions-node24	2025-12-16 14:47:59 +00:00
Salman Muin Kayser Chishti	b66aa115ac	Upgrade GitHub Actions for Node 24 compatibility	2025-12-16 14:26:24 +00:00
Balazs Racz	baa69dfb78	Makes the entire runtime_config passed into the activations constructor. PiperOrigin-RevId: 845153671	2025-12-16 01:56:52 -08:00
Krzysztof Rymski	44dfd69b9b	Internal changes PiperOrigin-RevId: 844759322	2025-12-15 07:14:37 -08:00
Jan Wassenberg	0c64987a96	Abort if args are unrecognized, refactor argument passing This catches typos/incorrect usage. Refactor: group Loader/Threading/Inference into GemmaArgs. All *Args ctors now have an extra ConsumedArgs& argument. PiperOrigin-RevId: 844690553	2025-12-15 03:18:45 -08:00
Jan Wassenberg	f50550f4ce	Warning fixes (sign mismatch), switch default PiperOrigin-RevId: 844679375	2025-12-15 02:41:19 -08:00
Martin Stolle	506fb22be7	No public description PiperOrigin-RevId: 843665619	2025-12-12 06:37:17 -08:00
Balazs Racz	338cd8a36e	Factors out a new cc_library `:query` from `:gemma-lib`. Moves query-related structs/classes to gemma/query.h. This refactors PerQuery, AllQueries, and QBatch into a dedicated header file, gemma/query.h, and updates BUILD dependencies accordingly. PiperOrigin-RevId: 843604293	2025-12-12 02:53:56 -08:00
Jan Wassenberg	73c3627b67	Add tensor stats and output tensor_info: add missing header io: fix mode weights.h: add layer_idx to LayerWeightsPtrs PiperOrigin-RevId: 843531051	2025-12-11 22:52:46 -08:00
Martin Stolle	bfc0dfcfca	Enable flags= parsing PiperOrigin-RevId: 843103750	2025-12-11 01:17:59 -08:00
Martin Stolle	78deacc357	Make attention configurable on the command line. PiperOrigin-RevId: 842760721	2025-12-10 09:34:06 -08:00
Martin Stolle	2441ff01bf	internal change PiperOrigin-RevId: 842749037	2025-12-10 09:01:15 -08:00
Krzysztof Rymski	64178ace38	Internal changes PiperOrigin-RevId: 842727112	2025-12-10 07:55:17 -08:00
Martin Stolle	9689fc82f9	internal change PiperOrigin-RevId: 842205671	2025-12-09 06:17:08 -08:00
Krzysztof Rymski	64d700cab5	Internal changes PiperOrigin-RevId: 842194766	2025-12-09 05:42:03 -08:00
Martin Stolle	14a9ecf21d	Factor out SumHeads PiperOrigin-RevId: 842138081	2025-12-09 02:23:16 -08:00
Martin Stolle	1014ae9e2a	Adding a simple test for GemmaAttention PiperOrigin-RevId: 842135414	2025-12-09 02:13:03 -08:00
Jan Wassenberg	5a6895c609	Avoid warning when OS affinity limits us to the second socket Also simplify NumSMT, detect from .smt field directly PiperOrigin-RevId: 841749486	2025-12-08 07:10:43 -08:00
Martin Stolle	b510ba2ab2	Improve clarity of indices II Sorry, didn't see this one before. PiperOrigin-RevId: 840218378	2025-12-04 06:33:33 -08:00
Martin Stolle	9348048885	Clean up toPtrs to delegate to toPtr PiperOrigin-RevId: 840214969	2025-12-04 06:22:04 -08:00
Krzysztof Rymski	2b4436beb6	Internal changes PiperOrigin-RevId: 840151004	2025-12-04 02:37:53 -08:00
Martin Stolle	d2090fddf3	Improve clarity of indices PiperOrigin-RevId: 839805634	2025-12-03 10:11:21 -08:00
Nitin Gangahar	6d3e2b6f73	Add missing includes. PiperOrigin-RevId: 839604341	2025-12-02 23:23:09 -08:00
Jan Wassenberg	a084d33e41	Fix Gemma3 image: ensure A matrix is packed, preallocate Also ignore -2 tokens PiperOrigin-RevId: 838869988	2025-12-01 11:47:23 -08:00
Jan Wassenberg	1564dd3111	Fix empty enabled_lps in topology detection Also expand the debug output. PiperOrigin-RevId: 838832605	2025-12-01 10:23:47 -08:00
Krzysztof Rymski	6e5e4123f1	Internal changes PiperOrigin-RevId: 837775282	2025-11-28 02:37:06 -08:00
Jan Wassenberg	3c9e6cf113	Expand debug output for topology PiperOrigin-RevId: 837738553	2025-11-28 00:19:33 -08:00
Jan Wassenberg	ccb49bc82f	Add ToFloatSlow, move RandomFloat to test_util PiperOrigin-RevId: 837412290	2025-11-27 00:14:51 -08:00
Krzysztof Rymski	c153d5255b	Internal changes PiperOrigin-RevId: 837001762	2025-11-26 01:05:35 -08:00
Martin Stolle	8696f6dd17	Clarify indices PiperOrigin-RevId: 836235539	2025-11-24 08:27:59 -08:00
Jan Wassenberg	37a25c9ffe	Fix warning (signed vs unsigned) PiperOrigin-RevId: 836106478	2025-11-24 00:51:17 -08:00
Charles Zhao	0e5f4cbf1b	Implement Continus Batching. (1) A function GenerateTWithContinuousBatching is added to use continuous batching when enabled. (2) The ContinuousQBatch is added as a subclass of QBatch to manage prefill, insert, used-kv-cache-collection. (3) Also expanded the unit test to more diverse cases. PiperOrigin-RevId: 836090261	2025-11-23 23:54:02 -08:00
Martin Stolle	88a03b7ec4	Added access to softmax attention internals to regular attention PiperOrigin-RevId: 835244205	2025-11-21 09:01:01 -08:00

1 2 3 4 5 ...

920 Commits All Branches Search

920 Commits

All Branches