gemma.cpp

Commit Graph

Author	SHA1	Message	Date
Eric Curtin	a971088ac2	Refactor `gemma/common.cc` to improve readability and safety Use `std::size` for array size calculations. Replace C-style string manipulations with `std::string` methods. Simplify `std::transform` usage for case conversion. Signed-off-by: Eric Curtin <ecurtin@redhat.com>	2024-12-09 16:36:25 +00:00
The gemma.cpp Authors	66bb435121	No public description PiperOrigin-RevId: 704178245	2024-12-09 00:49:36 -08:00
Phil Culliton	9dfe2a76be	Internal change PiperOrigin-RevId: 702961613	2024-12-04 20:41:47 -08:00
Jan Wassenberg	6a34e9c547	Print cache info and update Highway version for that PiperOrigin-RevId: 702318451	2024-12-03 06:31:52 -08:00
Jan Wassenberg	f74d496879	Threading/infra improvements. * Add ParallelizeRange helpers and partitioning helpers Refactor Pinning class, store original affinity (required to construct another NestedPools after pinning happened) Compress: * prevent Compress printing stats in tests * zero-pad tensors Matmul: * add matmul_unit_test (TODO) and bench_matmul * matmul_test: change norm to row vectors (that is what is added) and include bf16 rounding error * Prepare for L2/L3 retrieval PiperOrigin-RevId: 700603811	2024-11-27 01:12:00 -08:00
Stanko Novakovic	109a4d9f85	Add a simple benchmark for batching. This is a simple Gemma benchmark with a fixed batch size of 32. PiperOrigin-RevId: 698843573	2024-11-21 10:59:49 -08:00
Ray Smith	3d1625d8c5	Improved consistency of compressor API, and added a universal method with a target type arg. Moved configs pybind up to root level. PiperOrigin-RevId: 698743417	2024-11-21 05:27:40 -08:00
Ray Smith	73640d2521	Added tensor_index as a single source of truth on tensor shapes/sources and transformations PiperOrigin-RevId: 697903886	2024-11-19 00:25:39 -08:00
Ray Smith	7d685a267f	Added pybind for configs. Added ability to test configs for equality. PiperOrigin-RevId: 697572671	2024-11-18 04:03:51 -08:00
Jan Wassenberg	36f02ef892	Internal change. PiperOrigin-RevId: 696815335	2024-11-15 02:22:32 -08:00
Ray Smith	96513a8dc3	Added a blob_compare tool that compares two sbs files that may have the blobs in a different order PiperOrigin-RevId: 696458888	2024-11-14 03:26:32 -08:00
Paul Chang	5674c33dc5	Replace CLIF SbsWriter with pybind-based gcpp extension Maintains compatibility with previous version. PiperOrigin-RevId: 696181603	2024-11-13 10:20:02 -08:00
Daniel Keysers	719699f132	Make top_k a runtime argument (instead of a model argument). PiperOrigin-RevId: 696170691	2024-11-13 09:48:59 -08:00
Paul Chang	b94295b6d9	Internal changes PiperOrigin-RevId: 696155630	2024-11-13 09:01:38 -08:00
Daniel Keysers	e54d9cbddd	Fix Griffin model: - use HalfRope position encodings - zero-initialize the caches for each Generate at position 0 The lack of the latter made the tests in gemma_test dependent on each other. PiperOrigin-RevId: 694509054	2024-11-08 08:30:53 -08:00
Paul Chang	d4050a2917	Expose BlobReader::Keys() PiperOrigin-RevId: 694166186	2024-11-07 10:28:39 -08:00
Jan Wassenberg	868b01601f	Simpler MatMul interface, vocab types, Tristate for use_spinning Add Extents2D, Range2D vocab types Matmul uses ConstMat for inputs and RowPtr for output Move RowVectorBatch to basics.h Separate threading.cc Fix topology string: report cores not LPs, and #HT Move QStride/IsMHA into LayerConfig ImageTokens does not require make_unique. matmul_test: no longer require template args PiperOrigin-RevId: 692963605	2024-11-04 07:48:29 -08:00
Jan Wassenberg	baaa221787	Move BF16 to basics.h for easier access, and use that typedef. PiperOrigin-RevId: 691422334	2024-10-30 08:09:11 -07:00
Daniel Keysers	ed4091921f	Reduce time for optimize_test and use exactly one (unpinned) thread. PiperOrigin-RevId: 691013413	2024-10-29 07:37:22 -07:00
Daniel Keysers	583bd93e9a	Factor out addition of ViTConfig to a ModelConfig. Use ModelConfig values for ImageTokens. Output timing info for image token generation. Add a method to copy image data into Image class directly. Minor changes: pipe ModelTraining to more places. PiperOrigin-RevId: 690572283	2024-10-28 05:29:33 -07:00
Jan Wassenberg	19cfe14c76	Warning fixes (casts) and fix Windows build for aligned_alloc PiperOrigin-RevId: 689734618	2024-10-25 04:14:04 -07:00
Jan Wassenberg	52af531820	Serialization for class members for use with ModelConfig PiperOrigin-RevId: 689720027	2024-10-25 03:12:34 -07:00
Copybara-Service	efff64605a	Merge pull request #435 from ufownl:feature/disable_topology PiperOrigin-RevId: 689399357	2024-10-24 08:55:23 -07:00
RangerUFO	ec3b27326b	Add a compilation option to disable topology	2024-10-24 18:32:43 +08:00
Paul Chang	4976066095	Try disabling benchmark's gtest integration PiperOrigin-RevId: 689010657	2024-10-23 10:12:45 -07:00
Paul Chang	4197d69dfc	New blob_store_test, ensure ReadOne checks actual size against requested size PiperOrigin-RevId: 688974390	2024-10-23 08:30:46 -07:00
Copybara-Service	91bf2317ff	Merge pull request #426 from ufownl:feature/read_image_from_stream PiperOrigin-RevId: 688137436	2024-10-21 08:00:23 -07:00
Copybara-Service	054935d24b	Merge pull request #432 from ufownl:bugfix/compress_weights_ce PiperOrigin-RevId: 688126076	2024-10-21 07:18:53 -07:00
RangerUFO	7d313aaade	Fix compilation errors of "compress_weights" target	2024-10-19 21:30:30 +08:00
RangerUFO	fcea743107	Fix Bazel builder failure	2024-10-19 19:54:46 +08:00
Jan Wassenberg	02ce1e344f	Use NestedPools, add NUMA infra Improved threading.h, fix thread counts for single package/cluster systems Temporarily forces to a single socket. Prefill 29.28 tps, decode 6.92. Also fix benchmarks.cc build, update tensor allocator to Allocator PiperOrigin-RevId: 687307167	2024-10-18 08:11:18 -07:00
Daniel Keysers	c6384574db	Fix PaliGemma's GenerateImageTokensT(). Move image related config values from LayerConfig to ModelConfig. Minor changes: Add a few comments, remove gcpp:: qualification where it wasn't needed in a few places, define local constants in VitAttention.DotSoftmaxWeightedSum() PiperOrigin-RevId: 687210519	2024-10-18 01:34:13 -07:00
RangerUFO	e48fc3abb4	Refactor the overloads of `Image::ReadPPM` method Remove the `std::istream` overload and directly parse the PPM format on the span. Load the image bytes in the file using `ReadFileToString` helper defined in "compression/io.h" instead of `std::ifstream`.	2024-10-18 02:10:29 +08:00
Ray Smith	0d68555f87	Eliminated TConfig. Changed CompressedLayer and CompressedWeights to be constructed with an instance of a LayerConfig and WeightsConfig respectively. Added CompressedModel to remove ByteStorageT and get rid of most of the type casting, as well as allowing the default destructor to be used and work properly. Adjusted WeightsWrapper and ForwardLayer etc to match. The only remaining template arg is the weight type. This enables all the instantiations to be deleted, apart from one per type. It also enables (but not yet done) the config to be stored in the blob file instead of having to be specified separately. Reduces the size of the gemma_lib and weights shared libraries by a factor of 4.3 and 3.2 respectively. PiperOrigin-RevId: 686870060	2024-10-17 05:04:22 -07:00
RangerUFO	de2f7d7e2c	Add an overload of `Image::ReadPPM` method Make it able to load image data from a `hwy::Span`.	2024-10-16 17:34:11 +08:00
RangerUFO	a784b8459d	Add an overload of `Image::ReadPPM` method Make it able to load image data from a stream.	2024-10-16 15:53:27 +08:00
Daniel Keysers	a4d6adbc43	Introduce QueryResult in GemmaEnv and add a shortcut for WrapAndTokenize. Remove max_tokens (and rely on only max_generated_tokens). PiperOrigin-RevId: 685662260	2024-10-14 04:45:21 -07:00
Copybara-Service	2892e232e2	Merge pull request #422 from ufownl:bugfix/compress_weights_ce PiperOrigin-RevId: 685635493	2024-10-14 02:46:33 -07:00
Daniel Keysers	5d0167904d	Fix PaliGemma model loading. PiperOrigin-RevId: 685591935	2024-10-13 23:48:55 -07:00
Daniel Keysers	b7eff19be4	Update expected ranges in dot_test. PiperOrigin-RevId: 685591625	2024-10-13 23:47:20 -07:00
RangerUFO	ed88115e6a	Fix compilation error of the weights compression tool	2024-10-11 18:55:06 +08:00
The gemma.cpp Authors	dfda53e634	Benchmark gemma.cpp with different length inputs. PiperOrigin-RevId: 684607945	2024-10-10 15:59:26 -07:00
Daniel Keysers	3cf519a53e	Remove unused "two-sizes" version of MulByConstAndAdd. PiperOrigin-RevId: 684515900	2024-10-10 11:32:25 -07:00
Daniel Keysers	1eb9ce19dd	Update expected ranges in dot_test. PiperOrigin-RevId: 684515143	2024-10-10 11:31:19 -07:00
Jan Wassenberg	6ab3ff5bde	Minor cleanup, Windows+Bazel build fixes add app.h comment compress-inl: remove unused typedef gemma-inl: add missing HWY_ATTR and cast separate sum-inl.h and basics.h headers replace more hwy::bfloat16_t with BF16 update include pragmas update dot_test thresholds update Highway version in Bazel for HWY_RCAST_ALIGNED fix PiperOrigin-RevId: 684464326	2024-10-10 09:05:06 -07:00
Ray Smith	85958f5fd3	Added MatPtr/MatPtrT/MatStorageT/MatStorage as a dynamically-sized replacement for CompressedArray. Definition of array size is moved to the constructor. Allocation is separate and parallelized. All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted. Replaced all previous ForEachTensor functions with a single unified function. PiperOrigin-RevId: 684451604	2024-10-10 08:22:30 -07:00
Daniel Keysers	a570e3f662	Reduce number of operations in Gelu() by one Mul. About 5% faster Gen.Activation. PiperOrigin-RevId: 684035719	2024-10-09 07:50:48 -07:00
Jan Wassenberg	2c28b18eb0	Add NestedPools: one per socket/cluster Use in dot_test app.h: add new flags and rename num_threads to max_threads matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases PiperOrigin-RevId: 683216386	2024-10-07 09:40:19 -07:00
Jan Wassenberg	bd53b0f7c3	Fix MSAN issue for multiturn. Rewind the prior EOS token. Also move MaybeCheckInitialized to allocator.h PiperOrigin-RevId: 683187458	2024-10-07 08:07:54 -07:00
Jan Wassenberg	5a71d819cb	Also enable f64 dot/sum for <f32 inputs Add bf16 support to Dot/SumKernelDouble in the same way as *Compensated. PiperOrigin-RevId: 682308683	2024-10-04 07:12:10 -07:00

1 2 3 4 5 ...

621 Commits All Branches Search

621 Commits

All Branches