Commit Graph

17 Commits

Author SHA1 Message Date
Phil Culliton 503aaddd65 Add 8-bit integer quantization (I8Stream) to Gemma.cpp.
PiperOrigin-RevId: 819787856
2025-10-15 09:25:20 -07:00
Jan Wassenberg a5ab99e4ba Memory use reduction: smaller/single MMStorage
PiperOrigin-RevId: 804865029
2025-09-09 05:32:46 -07:00
Jan Wassenberg 56186193c1 Replace mt19937 with new generator to enable parallel sampling
Split it into immutable AesCtrEngine and RngStream
Also add RowSpan and Logits span

PiperOrigin-RevId: 803336423
2025-09-04 23:49:10 -07:00
Jan Wassenberg afd82376a5 Add AES-CTR RNG for parallel sampling (not yet used)
PiperOrigin-RevId: 802991142
2025-09-04 05:58:42 -07:00
Jan Wassenberg 4be4799727 Remove kMaxPackages and per-package-related code
matmul: remove kMaxClusters, dynamic allocation
PiperOrigin-RevId: 802950348
2025-09-04 03:33:12 -07:00
Jan Wassenberg 7263ab8445 MatMul simplification, threading strategy improvements
remove MatMul f32 special case (smaller code),
types: Add u32/u64 for use by Activations
move renamed ParallelismStrategy to threading_context so can pass ctx
ensure worker index is unique across clusters
matmul.h: const member functions for renamed policy classes (easier to call)
PiperOrigin-RevId: 802848086
2025-09-03 21:45:07 -07:00
Jan Wassenberg b7b3d353db Simplify MatMul: remove F32 special case (build time)
Also move kMaxM into separate kMaxBatchSize

PiperOrigin-RevId: 802086590
2025-09-02 04:29:21 -07:00
Jan Wassenberg 701841897b Default to disabling per-socket parallelization
weights: default to Read for small-batch (only look at qbatch, not the larger prefill tbatch)
PiperOrigin-RevId: 790787643
2025-08-04 09:49:14 -07:00
Jan Wassenberg 4e6aa36e9b Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes
PiperOrigin-RevId: 745068776
2025-04-08 03:35:55 -07:00
Apoorv Reddy d854471ae2 Use vectorized TopK using highway VQSelect
PiperOrigin-RevId: 728159153
2025-02-18 05:01:39 -08:00
Jan Wassenberg a60b564b88 Infra improvements (2)
ops.h: move CreateInvTimescale to allow calling without depending on gemma
Pass around MatMulEnv instead of pools to avoid re-creating the env
profiler.h can now be used outside SIMD code
allocator: add StepBytes and QuantumSteps
rename worker thread with package/cluster in the name
threading: add Visit* to IndexRange
PiperOrigin-RevId: 718766704
2025-01-23 01:55:19 -08:00
Jan Wassenberg c4398fc72d Infra improvements:
allocator: support mmap, fixed Bind, add padding
bench_matmul: Add PreventElision
BUILD: add ops_test build target
matmul.h: move ConstMat here; dynamic alloc of MatMulEnv
matmul_test: remove benchmarking
replace fprintf with HWY_WARN
threading.cc: support splitting large clusters (disabled); package_idx->pkg_idx, smaller IndexRangePartition
PiperOrigin-RevId: 717512274
2025-01-20 06:22:49 -08:00
Jan Wassenberg 7b77909427 Fix unhandled switch warning/error
PiperOrigin-RevId: 704828160
2024-12-10 13:32:53 -08:00
Jan Wassenberg f74d496879 Threading/infra improvements.
* Add Parallelize*Range helpers and partitioning helpers
* Refactor Pinning class, store original affinity (required to construct another NestedPools after pinning happened)

Compress:
* prevent Compress printing stats in tests
* zero-pad tensors

Matmul:
* add matmul_unit_test (TODO) and bench_matmul
* matmul_test: change norm to row vectors (that is what is added) and include bf16 rounding error
* Prepare for L2/L3 retrieval
PiperOrigin-RevId: 700603811
2024-11-27 01:12:00 -08:00
Jan Wassenberg 868b01601f Simpler MatMul interface, vocab types, Tristate for use_spinning
Add Extents2D, Range2D vocab types
Matmul uses ConstMat for inputs and RowPtr for output
Move RowVectorBatch to basics.h
Separate threading.cc
Fix topology string: report cores not LPs, and #HT
Move QStride/IsMHA into LayerConfig
ImageTokens does not require make_unique.
matmul_test: no longer require template args
PiperOrigin-RevId: 692963605
2024-11-04 07:48:29 -08:00
Jan Wassenberg baaa221787 Move BF16 to basics.h for easier access, and use that typedef.
PiperOrigin-RevId: 691422334
2024-10-30 08:09:11 -07:00
Jan Wassenberg 6ab3ff5bde Minor cleanup, Windows+Bazel build fixes
add app.h comment
compress-inl: remove unused typedef
gemma-inl: add missing HWY_ATTR and cast
separate sum-inl.h and basics.h headers
replace more hwy::bfloat16_t with BF16
update include pragmas
update dot_test thresholds
update Highway version in Bazel for HWY_RCAST_ALIGNED fix
PiperOrigin-RevId: 684464326
2024-10-10 09:05:06 -07:00