gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	035273c184	tune pool kSpin mode in threading_context Previously, this happened concurrently with the matmul autotune, which could lead to incorrect outcomes. threading: de-singleton Pinning (no longer stores affinity); pass PoolWorkerMapping; fix Pool dtor order Also enable SPR target (Zen4 is AMD-only), update Highway version for renamed Thread()->GlobalIdx(). PiperOrigin-RevId: 816223017	2025-10-07 08:36:26 -07:00
Jan Wassenberg	24b1760f03	Refactor: move Worker to ThreadingContext, factor out MMDecompress PiperOrigin-RevId: 804909921	2025-09-09 07:56:12 -07:00
Jan Wassenberg	06e5da1e22	Cleanup: split CacheInfo from Allocator, MatMul helper functions Lift DecompressA out of main autotuner to prevent interference Also use kMaxNR / kNR constants instead of extra args Fix: only require vector alignment, not cache alignment PiperOrigin-RevId: 804333769	2025-09-08 02:23:58 -07:00
Jan Wassenberg	4be4799727	Remove kMaxPackages and per-package-related code matmul: remove kMaxClusters, dynamic allocation PiperOrigin-RevId: 802950348	2025-09-04 03:33:12 -07:00
Jan Wassenberg	7263ab8445	MatMul simplification, threading strategy improvements remove MatMul f32 special case (smaller code), types: Add u32/u64 for use by Activations move renamed ParallelismStrategy to threading_context so can pass ctx ensure worker index is unique across clusters matmul.h: const member functions for renamed policy classes (easier to call) PiperOrigin-RevId: 802848086	2025-09-03 21:45:07 -07:00
Jan Wassenberg	98ddc166db	Expand ThreadingContext comments PiperOrigin-RevId: 800479954	2025-08-28 08:32:10 -07:00
Jan Wassenberg	faa4102992	(Resubmit) Prepare profiler annotations for new API Pass hwy::Profiler& to low-level functions. Used ThreadingContext arg instead of NestedPools. Use new PROFILER_ZONE3. PiperOrigin-RevId: 794461159	2025-08-13 01:38:24 -07:00
The gemma.cpp Authors	a2d9133f7d	Prepare profiler annotations for new API Pass hwy::Profiler& to low-level functions. Used ThreadingContext arg instead of NestedPools. Use new PROFILER_ZONE3. PiperOrigin-RevId: 793865287	2025-08-11 17:51:38 -07:00
Jan Wassenberg	4cbf63e6f0	Prepare profiler annotations for new API Pass hwy::Profiler& to low-level functions. Used ThreadingContext arg instead of NestedPools. Use new PROFILER_ZONE3. PiperOrigin-RevId: 793821255	2025-08-11 15:34:52 -07:00
Jan Wassenberg	701841897b	Default to disabling per-socket parallelization weights: default to Read for small-batch (only look at qbatch, not the larger prefill tbatch) PiperOrigin-RevId: 790787643	2025-08-04 09:49:14 -07:00
Jan Wassenberg	e76e29ce11	De-singleton ThreadingContext so callers can pass in their own weights.cc: fix BindB argument for bf16 tensors threading_test: enable autotune PiperOrigin-RevId: 785763618	2025-07-22 02:08:46 -07:00
Jan Wassenberg	f2adbfbcab	Batch inference fixes: set pos during prefill, fix assert PiperOrigin-RevId: 772458760	2025-06-17 07:09:44 -07:00
Jan Wassenberg	275135d7e8	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete PiperOrigin-RevId: 755397220	2025-05-06 09:12:43 -07:00
Jan Wassenberg	160a5824fb	Cleanup: include fixes/comments, fix leak, vector reserve Also remove unused RowSpan configs.cc: Assign prompt wrapping to ModelConfig configs.h: simplify EnumValid via sentinel PiperOrigin-RevId: 750278497	2025-04-22 12:01:46 -07:00
Jan Wassenberg	8532da47f7	Major refactor of allocator/args: use new ThreadingContext2 instead of monostate/init in each frontend Add ThreadingArgs(replaces AppArgs) backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride compress_weights: remove, moving to py-only exporter instead Move MatPtr to mat.h and revise interface: - Generic MatOwner - rename accessors to Packed* - support stride/row accessors, fix RowPtr stride Add TypeBits(Type) Move GenerateMat to test_util-inl for sharing between matmul test/bench Move internal init to gemma.cc to avoid duplication Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage Remove --compressed_weights, use --weights instead. tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img Allocator: use normal unique_ptr for AllocBytes so users can call directly threading: use -> because AlignedPtr no longer assumes arrays PiperOrigin-RevId: 745918637	2025-04-10 01:29:54 -07:00
Jan Wassenberg	5d4f7e0f7e	Add new singleton Allocator2 instead of monostate Not yet used. Also fix format-string warning in topology.cc. PiperOrigin-RevId: 745166210	2025-04-08 09:00:59 -07:00

16 Commits