gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Ray Smith	0d68555f87	Eliminated TConfig. Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively. Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly. Adjusted WeightsWrapper and ForwardLayer etc to match. The only remaining template arg is the weight type. This enables all the instantiations to be deleted, apart from one per type. It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately. Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively. PiperOrigin-RevId: 686870060	2024-10-17 05:04:22 -07:00
Daniel Keysers	3cf519a53e	Remove unused "two-sizes" version of MulByConstAndAdd. PiperOrigin-RevId: 684515900	2024-10-10 11:32:25 -07:00
Jan Wassenberg	6ab3ff5bde	Minor cleanup, Windows+Bazel build fixes add app.h comment compress-inl: remove unused typedef gemma-inl: add missing HWY_ATTR and cast separate sum-inl.h and basics.h headers replace more hwy::bfloat16_t with BF16 update include pragmas update dot_test thresholds update Highway version in Bazel for HWY_RCAST_ALIGNED fix PiperOrigin-RevId: 684464326	2024-10-10 09:05:06 -07:00
Daniel Keysers	a570e3f662	Reduce number of operations in Gelu() by one Mul. About 5% faster Gen.Activation. PiperOrigin-RevId: 684035719	2024-10-09 07:50:48 -07:00
Jan Wassenberg	5a71d819cb	Also enable f64 dot/sum for <f32 inputs Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated. PiperOrigin-RevId: 682308683	2024-10-04 07:12:10 -07:00
Jan Wassenberg	96d2ab7d31	Minor fix to profiler zone and add comment PiperOrigin-RevId: 681350546	2024-10-02 01:37:50 -07:00
Jan Wassenberg	7d9fcda0d8	-467ms startup: parallel Reshape Also split Softmax into Argmax helper, add comments; add profiler zones + fix IDE warning PiperOrigin-RevId: 680954573	2024-10-01 04:11:35 -07:00
Daniel Keysers	d83ad76679	Rename one variable in SampleTopK and update TestSampleTopK. PiperOrigin-RevId: 680897787	2024-10-01 00:51:33 -07:00
Jan Wassenberg	2d14d796e3	1.09x decode speedup for topk=1/temp0: fuse softmax and sample PiperOrigin-RevId: 680589099	2024-09-30 08:37:41 -07:00
Jan Wassenberg	5e812f07f5	Use f64 Dot and sum in softmax - faster than Cascaded Also let the kernel specify the Raw and State types, rename WeightT/VecT -> WT/VT. PiperOrigin-RevId: 680464427	2024-09-30 01:22:09 -07:00
Jan Wassenberg	35fdf848c7	Cascaded summation for Softmax This can affect generation results after a few hundred tokens. Also remove profiler from DecompressAndCall, use Add instead of +=, use PackedSpan for args and remove alignment requirement. Changing accumulation order in AssimilateCascadedSums updates dot_test thresholds. PiperOrigin-RevId: 676891797	2024-09-20 10:31:23 -07:00
Daniel Keysers	03f0ee2323	Add tests for SampleTopK that highlight existing problems and fix those: - Sampling was not correct for k>1 and temperature=0. - Sampling was not correct for only negative logits. Also restructure the code a bit for better readability and add some asserts for things that shouldn't happen. PiperOrigin-RevId: 676043267	2024-09-18 10:32:01 -07:00
Daniel Keysers	892f3bbcbe	Implement scalar version of LayerNorm PiperOrigin-RevId: 675085495	2024-09-16 03:54:10 -07:00
Jan Wassenberg	8c0a8834c1	Major compression update, arbitrary-len unpack + new Dot Compression: * Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad * New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test * Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking * NUQ: support arbitrary-length enc/dec * New compression/shared, remove sfp.h and nuq.h * Move Store2 into Traits and provide Compress2 wrapper * Remove unused Decompress()-with-pool overload * Simplify CompressedArrayLen, rename to CompressedArrayElements * Remove unused DistortionStats b_l1_ Misc: * Add compensated and Kahan dot, support any length * Use same Dot function everywhere * Move exact arithmetic functions into fp_arith * use FloatPtr and MatPtr typedefs in tests; less stack usage * Rename args to packed/raw * Remove Traits::Name, instead TypeName<T>() * Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream PiperOrigin-RevId: 672868468	2024-09-10 02:22:19 -07:00
Jan Wassenberg	4033ed9e78	Avoid duplication of RMSNorm, support all activation/weight types Add test for RMSNorm Rename VectorizedRopeAndMulBy -> RopeAndMulBy Move test_util to util/ PiperOrigin-RevId: 668332927	2024-08-28 01:26:55 -07:00
Jan Wassenberg	b6d0ca8a14	Minor followup: remainder handling is a single iteration Also add profiler annotations. PiperOrigin-RevId: 667883774	2024-08-27 01:19:44 -07:00
Apoorv Reddy	48d0801fb0	Vectorize Rope for qkv dim not evenly divisible by number of lanes. PiperOrigin-RevId: 665776602	2024-08-21 02:22:22 -07:00
Apoorv Reddy	c6eb3b6f0d	VectorizedRopeAndMulBy. ~8x reduction (tested on few prompts) in Rope. ~3.8% prefill latency improvement. ~2.6% decode latency improvement. PiperOrigin-RevId: 664650108	2024-08-18 23:17:01 -07:00
Jan Wassenberg	2ebbe4076f	1.03-1.08x decode speedup: precompute Rope theta, fuse Split attention into functions, move into class. Fuse Rope and MulBy, allow non-in-place version to avoid copy from q to KV. Sink if() into MaybeLogitsSoftCap. PiperOrigin-RevId: 661168418	2024-08-09 01:23:24 -07:00
Jan Wassenberg	85cac13fb1	Split up ops.h into ops/ops-inl and matmul-inl PiperOrigin-RevId: 654068303	2024-07-19 11:21:48 -07:00

20 Commits