Commit Graph

20 Commits

Author SHA1 Message Date
Jan Wassenberg 3ed403e287 Major cleanup of profiler zones, add Caller annotation for all pool.Run
Pass ThreadingContext instead of Pools/Profiler individually, for access to Zones
Add GCPP_ZONE helper
Add Caller argument to pool.Run to enable new stats
Remove most direct dependencies on ThreadPool, prefer ParallelFor

PiperOrigin-RevId: 822934530
2025-10-23 01:54:24 -07:00
Jan Wassenberg 035273c184 tune pool kSpin mode in threading_context
Previously, this happened concurrently with the matmul autotune, which could lead to incorrect outcomes.

threading: de-singleton Pinning (no longer stores affinity); pass PoolWorkerMapping; fix Pool dtor order
Also enable SPR target (Zen4 is AMD-only),
update Highway version for renamed Thread()->GlobalIdx().
PiperOrigin-RevId: 816223017
2025-10-07 08:36:26 -07:00
Jan Wassenberg cbe24eac51 1.15x speedup: parallel sampling, enabled by new RNG
Also pass pos to SampleFunc, for seeding the RNG.

PiperOrigin-RevId: 803453518
2025-09-05 07:24:02 -07:00
Jan Wassenberg 56186193c1 Replace mt19937 with new generator to enable parallel sampling
Split it into immutable AesCtrEngine and RngStream
Also add RowSpan and Logits span

PiperOrigin-RevId: 803336423
2025-09-04 23:49:10 -07:00
Jan Wassenberg faa4102992 (Resubmit) Prepare profiler annotations for new API
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.

PiperOrigin-RevId: 794461159
2025-08-13 01:38:24 -07:00
The gemma.cpp Authors a2d9133f7d Prepare profiler annotations for new API
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.

PiperOrigin-RevId: 793865287
2025-08-11 17:51:38 -07:00
Jan Wassenberg 4cbf63e6f0 Prepare profiler annotations for new API
Pass hwy::Profiler& to low-level functions.
Used ThreadingContext arg instead of NestedPools.
Use new PROFILER_ZONE3.

PiperOrigin-RevId: 793821255
2025-08-11 15:34:52 -07:00
Jan Wassenberg d1638587f0 1.14x batch decode speedup: parallelize RMSNorm ops
Activations was over-parallelized, use single pool instead.
Also improve profiler zone annotations,
pass through worker args (for tracking concurrency), now non-optional.

PiperOrigin-RevId: 788790976
2025-07-30 00:55:45 -07:00
Jan Wassenberg ac0d751d20 Rename GetModelConfig->Config
PiperOrigin-RevId: 788506480
2025-07-29 10:18:12 -07:00
Jan Wassenberg a04cc287b2 Move MatMulEnv out of Gemma to enable concurrent calls
Also update benchmark_helper config print: add profiler, remove free mem

PiperOrigin-RevId: 774662974
2025-06-23 01:20:09 -07:00
Jan Wassenberg cd80d8b24d Speed up builds by skipping rarely used targets
Centralize previous code into GEMMA_DISABLED_TARGETS

PiperOrigin-RevId: 772433723
2025-06-17 05:44:20 -07:00
Jan Wassenberg 8d0882b966 Huge refactor of weight handling and model loading.
Weight handling:
- new ModelStore2 supports both pre-2025 multi-file and single-file formats
- simpler ForEachTensor with TensorArgs
- tensors are constructed with their full suffixed name

I/O:
- support mmap and stride
- Simplified SbsWriter, single insert(); add SbsReader

Misc:
- kMockTokenizer: allow creating with unavailable tokenizer
- configs.h: Simpler enum validity checks via kSentinel
- matmul.h: remove unused enable_bind (now in allocator.h)
- tensor_info: single TensorInfoRegistry class, rename from tensor_index.h

Frontends:
- Replace Allocate/CreateGemma with ctor(LoaderArgs, MatMulEnv&)
- Deduce model/weight type, remove --model and parsing
- Replace most common.h includes with configs.h
- Remove --compressed_weights, use --weights instead
- Remove ModelInfo, replaced by ModelConfig.

Backprop:
- Reduce max loss, remove backward_scalar_test (timeout)
- Update thresholds because new RandInit changes rng eval order and thus numerics
PiperOrigin-RevId: 755317484
2025-05-06 04:44:21 -07:00
Jan Wassenberg 160a5824fb Cleanup: include fixes/comments, fix leak, vector reserve
Also remove unused RowSpan
configs.cc: Assign prompt wrapping to ModelConfig
configs.h: simplify EnumValid via sentinel

PiperOrigin-RevId: 750278497
2025-04-22 12:01:46 -07:00
Daniel Keysers 5bbe814a53 Tiny cleanup.
PiperOrigin-RevId: 704636988
2024-12-10 03:34:05 -08:00
Daniel Keysers 719699f132 Make top_k a runtime argument (instead of a model argument).
PiperOrigin-RevId: 696170691
2024-11-13 09:48:59 -08:00
Ray Smith 0d68555f87 Eliminated TConfig.
Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively.
Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly.
Adjusted WeightsWrapper and ForwardLayer etc to match.
The only remaining template arg is the weight type.
This enables all the instantiations to be deleted, apart from one per type.
It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately.
Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively.

PiperOrigin-RevId: 686870060
2024-10-17 05:04:22 -07:00
Daniel Keysers a4d6adbc43 Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize.
Remove max_tokens (and rely on only max_generated_tokens).

PiperOrigin-RevId: 685662260
2024-10-14 04:45:21 -07:00
Jan Wassenberg 2d14d796e3 1.09x decode speedup for topk=1/temp0: fuse softmax and sample
PiperOrigin-RevId: 680589099
2024-09-30 08:37:41 -07:00
Jan Wassenberg 85fcd3cd80 Cleanup: add ModelInfo struct, remove gcpp::
PiperOrigin-RevId: 648707763
2024-07-02 07:11:15 -07:00
Jan Wassenberg af8eb2fde3 Declutter gemma/ directory, move binaries to evals/ and util/.
PiperOrigin-RevId: 648400795
2024-07-01 09:51:04 -07:00
Renamed from gemma/cross_entropy.cc (Browse further)