gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Jan Wassenberg	4be4799727	Remove kMaxPackages and per-package-related code matmul: remove kMaxClusters, dynamic allocation PiperOrigin-RevId: 802950348	2025-09-04 03:33:12 -07:00
Jan Wassenberg	d538a6d6c6	Cleanup: remove unused kCyclic, remove 2 suffix Also remove now unused allocator arg and fix warnings (cast, struct/class mismatch) PiperOrigin-RevId: 758098495	2025-05-13 01:06:41 -07:00
Jan Wassenberg	275135d7e8	Rename-only: remove Allocator2 etc suffixes now that refactoring is complete PiperOrigin-RevId: 755397220	2025-05-06 09:12:43 -07:00
Jan Wassenberg	fe80f10ed7	Backprop test fixes and allocator cleanup - Shorten backprop tests to prevent timeout - Add line number of failing test - matmul: remove unused enable_bind - allocator: we will retain enable_bind there - mat: disable cyclic padding optimization (broken) PiperOrigin-RevId: 752656068	2025-04-29 03:01:10 -07:00
Jan Wassenberg	8532da47f7	Major refactor of allocator/args: use new ThreadingContext2 instead of monostate/init in each frontend Add ThreadingArgs(replaces AppArgs) backprop: use Packed() accessor and MakePacked factory and row-based access to allow for stride compress_weights: remove, moving to py-only exporter instead Move MatPtr to mat.h and revise interface: - Generic MatOwner - rename accessors to Packed* - support stride/row accessors, fix RowPtr stride Add TypeBits(Type) Move GenerateMat to test_util-inl for sharing between matmul test/bench Move internal init to gemma.cc to avoid duplication Rename GemmaEnv model_ to gemma_ for disambiguating vs upcoming ModelStorage Remove --compressed_weights, use --weights instead. tensor_index: add ExtentsFromInfo and TensorIndexLLM/Img Allocator: use normal unique_ptr for AllocBytes so users can call directly threading: use -> because AlignedPtr no longer assumes arrays PiperOrigin-RevId: 745918637	2025-04-10 01:29:54 -07:00
Jan Wassenberg	5d4f7e0f7e	Add new singleton Allocator2 instead of monostate Not yet used. Also fix format-string warning in topology.cc. PiperOrigin-RevId: 745166210	2025-04-08 09:00:59 -07:00
Jan Wassenberg	1b72c22345	Refactor Gemma ctor and improve pool NUMA support Gemma receives a MatMulEnv arg, with comment on lifetime Split threading into topology so the latter can be used in allocator Add AllocClasses() for non-POD (ThreadPool) Support binding pool to NUMA node Update threading_test with latency measurements Also update Highway version. PiperOrigin-RevId: 736904748	2025-03-14 10:19:00 -07:00
Jan Wassenberg	f9d93e4a42	Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning Remove empty matmul_unit_test. Up to 25 TFLOP/s on 2xZen4 for 512,3072,24576. PiperOrigin-RevId: 729123576	2025-02-20 08:33:46 -08:00
Jan Wassenberg	b18bd781f6	Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts PiperOrigin-RevId: 724340518	2025-02-07 07:38:55 -08:00
Jan Wassenberg	a60b564b88	Infra improvements (2) ops.h: move CreateInvTimescale to allow calling without depending on gemma Pass around MatMulEnv instead of pools to avoid re-creating the env profiler.h can now be used outside SIMD code allocator: add StepBytes and QuantumSteps rename worker thread with package/cluster in the name threading: add Visit* to IndexRange PiperOrigin-RevId: 718766704	2025-01-23 01:55:19 -08:00
Jan Wassenberg	c4398fc72d	Infra improvements: allocator: support mmap, fixed Bind, add padding bench_matmul: Add PreventElision BUILD: add ops_test build target matmul.h: move ConstMat here; dynamic alloc of MatMulEnv matmul_test: remove benchmarking replace fprintf with HWY_WARN threading.cc: support splitting large clusters (disabled); package_idx->pkg_idx, smaller IndexRangePartition PiperOrigin-RevId: 717512274	2025-01-20 06:22:49 -08:00
Jan Wassenberg	f74d496879	Threading/infra improvements. * Add ParallelizeRange helpers and partitioning helpers Refactor Pinning class, store original affinity (required to construct another NestedPools after pinning happened) Compress: * prevent Compress printing stats in tests * zero-pad tensors Matmul: * add matmul_unit_test (TODO) and bench_matmul * matmul_test: change norm to row vectors (that is what is added) and include bf16 rounding error * Prepare for L2/L3 retrieval PiperOrigin-RevId: 700603811	2024-11-27 01:12:00 -08:00
Jan Wassenberg	868b01601f	Simpler MatMul interface, vocab types, Tristate for use_spinning Add Extents2D, Range2D vocab types Matmul uses ConstMat for inputs and RowPtr for output Move RowVectorBatch to basics.h Separate threading.cc Fix topology string: report cores not LPs, and #HT Move QStride/IsMHA into LayerConfig ImageTokens does not require make_unique. matmul_test: no longer require template args PiperOrigin-RevId: 692963605	2024-11-04 07:48:29 -08:00
Jan Wassenberg	02ce1e344f	Use NestedPools, add NUMA infra Improved threading.h, fix thread counts for single package/cluster systems Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92. Also fix benchmarks.cc build, update tensor allocator to Allocator PiperOrigin-RevId: 687307167	2024-10-18 08:11:18 -07:00

14 Commits