gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Copybara-Service	91bf2317ff	Merge pull request #426 from ufownl:feature/read_image_from_stream PiperOrigin-RevId: 688137436	2024-10-21 08:00:23 -07:00
Copybara-Service	054935d24b	Merge pull request #432 from ufownl:bugfix/compress_weights_ce PiperOrigin-RevId: 688126076	2024-10-21 07:18:53 -07:00
RangerUFO	7d313aaade	Fix compilation errors of "compress_weights" target	2024-10-19 21:30:30 +08:00
RangerUFO	fcea743107	Fix Bazel builder failure	2024-10-19 19:54:46 +08:00
Jan Wassenberg	02ce1e344f	Use NestedPools, add NUMA infra Improved threading.h, fix thread counts for single package/cluster systems Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92. Also fix benchmarks.cc build, update tensor allocator to Allocator PiperOrigin-RevId: 687307167	2024-10-18 08:11:18 -07:00
Daniel Keysers	c6384574db	Fix PaliGemma's GenerateImageTokensT(). Move image related config values from LayerConfig to ModelConfig. Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum() PiperOrigin-RevId: 687210519	2024-10-18 01:34:13 -07:00
RangerUFO	e48fc3abb4	Refactor the overloads of `Image::ReadPPM` method Remove the `std::istream` overload and directly parse the PPM format on the span. Load the image bytes in the file using `ReadFileToString` helper defined in "compression/io.h" instead of `std::ifstream`.	2024-10-18 02:10:29 +08:00
Ray Smith	0d68555f87	Eliminated TConfig. Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively. Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly. Adjusted WeightsWrapper and ForwardLayer etc to match. The only remaining template arg is the weight type. This enables all the instantiations to be deleted, apart from one per type. It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately. Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively. PiperOrigin-RevId: 686870060	2024-10-17 05:04:22 -07:00
RangerUFO	de2f7d7e2c	Add an overload of `Image::ReadPPM` method Make it able to load image data from a `hwy::Span`.	2024-10-16 17:34:11 +08:00
RangerUFO	a784b8459d	Add an overload of `Image::ReadPPM` method Make it able to load image data from a stream.	2024-10-16 15:53:27 +08:00
Daniel Keysers	a4d6adbc43	Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize. Remove max_tokens (and rely on only max_generated_tokens). PiperOrigin-RevId: 685662260	2024-10-14 04:45:21 -07:00
Copybara-Service	2892e232e2	Merge pull request #422 from ufownl:bugfix/compress_weights_ce PiperOrigin-RevId: 685635493	2024-10-14 02:46:33 -07:00
Daniel Keysers	5d0167904d	Fix PaliGemma model loading. PiperOrigin-RevId: 685591935	2024-10-13 23:48:55 -07:00
Daniel Keysers	b7eff19be4	Update expected ranges in dot_test. PiperOrigin-RevId: 685591625	2024-10-13 23:47:20 -07:00
RangerUFO	ed88115e6a	Fix compilation error of the weights compression tool	2024-10-11 18:55:06 +08:00
The gemma.cpp Authors	dfda53e634	Benchmark gemma.cpp with different length inputs. PiperOrigin-RevId: 684607945	2024-10-10 15:59:26 -07:00
Daniel Keysers	3cf519a53e	Remove unused "two-sizes" version of MulByConstAndAdd. PiperOrigin-RevId: 684515900	2024-10-10 11:32:25 -07:00
Daniel Keysers	1eb9ce19dd	Update expected ranges in dot_test. PiperOrigin-RevId: 684515143	2024-10-10 11:31:19 -07:00
Jan Wassenberg	6ab3ff5bde	Minor cleanup, Windows+Bazel build fixes add app.h comment compress-inl: remove unused typedef gemma-inl: add missing HWY_ATTR and cast separate sum-inl.h and basics.h headers replace more hwy::bfloat16_t with BF16 update include pragmas update dot_test thresholds update Highway version in Bazel for HWY_RCAST_ALIGNED fix PiperOrigin-RevId: 684464326	2024-10-10 09:05:06 -07:00
Ray Smith	85958f5fd3	Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray. Definition of array size is moved to the constructor. Allocation is separate and parallelized. All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted. Replaced all previous ForEachTensor functions with a single unified function. PiperOrigin-RevId: 684451604	2024-10-10 08:22:30 -07:00
Daniel Keysers	a570e3f662	Reduce number of operations in Gelu() by one Mul. About 5% faster Gen.Activation. PiperOrigin-RevId: 684035719	2024-10-09 07:50:48 -07:00
Jan Wassenberg	2c28b18eb0	Add NestedPools: one per socket/cluster Use in dot_test app.h: add new flags and rename num_threads to max_threads matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases PiperOrigin-RevId: 683216386	2024-10-07 09:40:19 -07:00
Jan Wassenberg	bd53b0f7c3	Fix MSAN issue for multiturn. Rewind the prior EOS token. Also move MaybeCheckInitialized to allocator.h PiperOrigin-RevId: 683187458	2024-10-07 08:07:54 -07:00
Jan Wassenberg	5a71d819cb	Also enable f64 dot/sum for <f32 inputs Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated. PiperOrigin-RevId: 682308683	2024-10-04 07:12:10 -07:00
Ray Smith	895ee4c6ce	Moved Internal code around to simplify PiperOrigin-RevId: 681877329	2024-10-03 07:55:21 -07:00
Krzysztof Ostrowski	12291e1ac0	Internal change. PiperOrigin-RevId: 681583569	2024-10-02 14:03:34 -07:00
Krzysztof Ostrowski	b3239bf509	Internal change. PiperOrigin-RevId: 681530185	2024-10-02 11:33:06 -07:00
Jan Wassenberg	96d2ab7d31	Minor fix to profiler zone and add comment PiperOrigin-RevId: 681350546	2024-10-02 01:37:50 -07:00
Daniel Keysers	dc2e5f1505	PaliGemma: fix image loading. Use uint8_t to make sure values are not interpreted as signed char. PiperOrigin-RevId: 680965115	2024-10-01 04:54:04 -07:00
Jan Wassenberg	7d9fcda0d8	-467ms startup: parallel Reshape Also split Softmax into Argmax helper, add comments; add profiler zones + fix IDE warning PiperOrigin-RevId: 680954573	2024-10-01 04:11:35 -07:00
Daniel Keysers	d83ad76679	Rename one variable in SampleTopK and update TestSampleTopK. PiperOrigin-RevId: 680897787	2024-10-01 00:51:33 -07:00
Jan Wassenberg	2d14d796e3	1.09x decode speedup for topk=1/temp0: fuse softmax and sample PiperOrigin-RevId: 680589099	2024-09-30 08:37:41 -07:00
Jan Wassenberg	897f902d28	Fix include order, required to build with profiler enabled PiperOrigin-RevId: 680574177	2024-09-30 07:52:50 -07:00
Jan Wassenberg	5e812f07f5	Use f64 Dot and sum in softmax - faster than Cascaded Also let the kernel specify the Raw and State types, rename WeightT/VecT -> WT/VT. PiperOrigin-RevId: 680464427	2024-09-30 01:22:09 -07:00
Jan Wassenberg	47eb80a90e	Add double-precision dot variant PiperOrigin-RevId: 679243590	2024-09-26 12:09:10 -07:00
Daniel Keysers	71116daf64	Tiny update of the README formatting. PiperOrigin-RevId: 679162673	2024-09-26 08:38:12 -07:00
Daniel Keysers	709143e9a6	Add download location of Pali Gemma weights to README.md. PiperOrigin-RevId: 679127088	2024-09-26 06:38:11 -07:00
Jan Wassenberg	1bd64ec350	1.6x speedup of MatMulSlow using compensated Dot PiperOrigin-RevId: 679063289	2024-09-26 02:42:53 -07:00
Daniel Keysers	606427022c	Fix compiler errors when trying to generate (unused) code for the ConfigNoVit struct. PiperOrigin-RevId: 679049377	2024-09-26 01:55:26 -07:00
Daniel Keysers	2290eb7d3f	Reduce flakiness of dot_test. PiperOrigin-RevId: 679049273	2024-09-26 01:54:27 -07:00
Copybara-Service	e3507190ae	Merge pull request #394 from ufownl:bugfix/prefix_lm PiperOrigin-RevId: 678710685	2024-09-25 08:25:31 -07:00
RangerUFO	d1010337c3	Fix prefix-LM mode assertion	2024-09-25 22:22:28 +08:00
Jan Wassenberg	e70e686805	Add forward and backward error PiperOrigin-RevId: 678297584	2024-09-24 10:10:29 -07:00
Daniel Keysers	673673cc98	Update expected entropy values for GRIFFIN_2B model. These changed after introduction of "Cascaded summation for Softmax" PiperOrigin-RevId: 678145851	2024-09-24 02:12:59 -07:00
Daniel Keysers	f8835fe4a4	Add support for PaliGemma Vision-LM (224x224) to gemma.cpp See https://arxiv.org/abs/2407.07726 for a description of the model. Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that. PiperOrigin-RevId: 677841119	2024-09-23 10:09:38 -07:00
Jan Wassenberg	c6c10e0a53	Fix topology display for platforms where it fails (Apple) PiperOrigin-RevId: 677800053	2024-09-23 08:14:54 -07:00
Jan Wassenberg	cdbfebb10f	Fix compress-inl bf16->f32 overrun Caught by Arm hwasan but not x86 asan. PiperOrigin-RevId: 677779421	2024-09-23 07:10:25 -07:00
Jan Wassenberg	35fdf848c7	Cascaded summation for Softmax This can affect generation results after a few hundred tokens. Also remove profiler from DecompressAndCall, use Add instead of +=, use PackedSpan for args and remove alignment requirement. Changing accumulation order in AssimilateCascadedSums updates dot_test thresholds. PiperOrigin-RevId: 676891797	2024-09-20 10:31:23 -07:00
Copybara-Service	09bc8d62cc	Merge pull request #380 from ufownl:bugfix/threading PiperOrigin-RevId: 676799495	2024-09-20 04:52:48 -07:00
Jan Wassenberg	bb6b398df3	Add pairwise sum dot products for testing Also add wrapper function for threshold comparison. PiperOrigin-RevId: 676749760	2024-09-20 01:48:52 -07:00

... 5 6 7 8 9 ...

795 Commits All Branches Search

795 Commits

All Branches