Commit Graph

621 Commits

Author SHA1 Message Date
Jan Wassenberg 2bdf26d81d Support bf16 output of Matmul
Adds Stride to ConstMat, to support decompression of C output for test
matmul_test: add line numbers to output
Also ignore "N is not a multiple of nc" when N==nc
PiperOrigin-RevId: 731096662
2025-02-25 17:53:20 -08:00
Jan Wassenberg b3b4b9f92f With new matmul, much larger batch sizes are advantageous, default to 256.
Can still override via command line argument.

PiperOrigin-RevId: 730502653
2025-02-24 10:21:58 -08:00
Jan Wassenberg 9a2360d719 Move batch_bench into test section, add GTest dep. Fixes #501
PiperOrigin-RevId: 729494223
2025-02-21 05:33:52 -08:00
Jan Wassenberg f9d93e4a42 Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning
Remove empty matmul_unit_test.
Up to 25 TFLOP/s on 2xZen4 for 512,3072,24576.

PiperOrigin-RevId: 729123576
2025-02-20 08:33:46 -08:00
Apoorv Reddy d854471ae2 Use vectorized TopK using highway VQSelect
PiperOrigin-RevId: 728159153
2025-02-18 05:01:39 -08:00
Apoorv Reddy 0e5b59d24d Implements FusedSoftmaxAndSampleTopK.
This computes softmax on the top-K logits, instead of computing softmax first and then getting top-K probs. So we end up avoiding renormalizing too. Additionally, modify softmax to do temperature scaling, if temp != 1.0

PiperOrigin-RevId: 727702149
2025-02-16 21:30:06 -08:00
Jan Wassenberg bdf5d25e97 Only temporarily enable spinning in threading benchmark
PiperOrigin-RevId: 727114863
2025-02-14 17:15:38 -08:00
Jan Wassenberg 06c70dccd9 Less verbose threading_test output, improve formatting.
PiperOrigin-RevId: 726364085
2025-02-13 00:56:34 -08:00
Daniel Keysers f173aa776e Add conversion tool for HF safetensors to gemma.cpp for PaliGemma.
PiperOrigin-RevId: 725990158
2025-02-12 03:47:43 -08:00
Copybara-Service c495b25995 Merge pull request #493 from ufownl:bugfix/compress_weights_le
PiperOrigin-RevId: 725585921
2025-02-11 05:10:13 -08:00
Apoorv Reddy 64cf6dfe0a Using TimingInfo methods and cleaning up args to DecodeStepT
PiperOrigin-RevId: 725580125
2025-02-11 04:49:14 -08:00
Jan Wassenberg 953c877658 Fix nuq Enc() to handle groups < kGroupSize.
Also remove no longer required dynamic allocation.

PiperOrigin-RevId: 725203824
2025-02-10 07:17:59 -08:00
Jan Wassenberg 5563d94811 Add fork/join latency benchmark
PiperOrigin-RevId: 725174042
2025-02-10 05:23:44 -08:00
Apoorv Reddy 780e376023 Add KVCache.DeepCopy() . Will be useful for implementing sampling functionality like beam sampling, parallel sampling, CoT Decoding (à la https://arxiv.org/abs/2402.10200)
PiperOrigin-RevId: 725156316
2025-02-10 04:10:29 -08:00
Apoorv Reddy 9b3e7ea8a2 Factor out DecodeStepT from GenerateT into a separate function.
This will be useful for adding sampling functionality like beam decoding, parallel sampling, cot decoding (as described in the [Chain-of-Thought Reasoning Without Prompting paper](https://arxiv.org/abs/2402.10200))

PiperOrigin-RevId: 725151530
2025-02-10 03:53:08 -08:00
Jan Wassenberg b0fe9a43e6 Further speed up blob_compare: single alloc, use dual sockets
PiperOrigin-RevId: 724947361
2025-02-09 10:53:49 -08:00
RangerUFO 3a5a6dbcad Fix the link error when building `compress_weights` with Clang on macOS 2025-02-09 00:13:25 +08:00
Jan Wassenberg b18bd781f6 Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts
PiperOrigin-RevId: 724340518
2025-02-07 07:38:55 -08:00
Oleh Prypin 82ca526c0c Remove `srcs_version` and `python_version` attributes, as they already default to `"PY3"`
PiperOrigin-RevId: 724122259
2025-02-06 16:51:11 -08:00
Jan Wassenberg f31e12e63b Improved blob diff: parallel, tolerance for float
PiperOrigin-RevId: 724060325
2025-02-06 13:46:28 -08:00
Jan Wassenberg 9f5159ff68 Public visibility for compression/
PiperOrigin-RevId: 723529541
2025-02-05 08:53:51 -08:00
Phil Culliton 7ccc6abe87 Allow conversion, loading and inference with NUQ.
PiperOrigin-RevId: 723507890
2025-02-05 07:45:54 -08:00
Phil Culliton 8a6edff319 Base interleaved handling for 4.5-bit NUQ, specifically Enc, DecompressAndZeroPad, and Dec2. Includes tests.
PiperOrigin-RevId: 721821577
2025-01-31 10:35:32 -08:00
Phil Culliton 23dac72463 Simplified interface class and example for Gemma.cpp usage.
PiperOrigin-RevId: 720591037
2025-01-28 08:48:27 -08:00
Daniel Keysers 7af2e70321 Add python wrappers for configs and inference.
Enable building compression/python/compression_test using bazel.
Add default image path for image_test and paligemma_test.

PiperOrigin-RevId: 720583438
2025-01-28 08:22:03 -08:00
Daniel Keysers bcdb0d65bd Assorted small cleanups.
PiperOrigin-RevId: 720548132
2025-01-28 06:09:45 -08:00
Jan Wassenberg a248f76245 Allow overriding num threads despite detecting topology
PiperOrigin-RevId: 720188756
2025-01-27 08:57:53 -08:00
Daniel Keysers e997468496 Apply PositionalEncodingQK always in-place.
PiperOrigin-RevId: 718851803
2025-01-23 07:09:30 -08:00
Apoorv Reddy ce807a31a1 internal change
PiperOrigin-RevId: 718824952
2025-01-23 05:31:11 -08:00
Jan Wassenberg a60b564b88 Infra improvements (2)
ops.h: move CreateInvTimescale to allow calling without depending on gemma
Pass around MatMulEnv instead of pools to avoid re-creating the env
profiler.h can now be used outside SIMD code
allocator: add StepBytes and QuantumSteps
rename worker thread with package/cluster in the name
threading: add Visit* to IndexRange
PiperOrigin-RevId: 718766704
2025-01-23 01:55:19 -08:00
Daniel Keysers f37402da57 Add parameter for base_frequency to CreateInvTimeScale().
Extract a few local variables to make code easier to read (hopefully).

PiperOrigin-RevId: 718749053
2025-01-23 00:56:44 -08:00
Daniel Keysers a133b3d062 Tiny fix: align template parameter order with parameter order.
PiperOrigin-RevId: 718411494
2025-01-22 09:13:23 -08:00
Phil Culliton 9646edc908 Internal change
PiperOrigin-RevId: 717916568
2025-01-21 07:53:49 -08:00
Copybara-Service f46052b5b4 Merge pull request #473 from ufownl:bugfix/migrate_weights_target
PiperOrigin-RevId: 717536480
2025-01-20 08:05:38 -08:00
Jan Wassenberg c4398fc72d Infra improvements:
allocator: support mmap, fixed Bind, add padding
bench_matmul: Add PreventElision
BUILD: add ops_test build target
matmul.h: move ConstMat here; dynamic alloc of MatMulEnv
matmul_test: remove benchmarking
replace fprintf with HWY_WARN
threading.cc: support splitting large clusters (disabled); package_idx->pkg_idx, smaller IndexRangePartition
PiperOrigin-RevId: 717512274
2025-01-20 06:22:49 -08:00
RangerUFO 20e5ef6d2e Add the missing `migrate_weights` target for CMake 2025-01-17 18:56:43 +08:00
Daniel Keysers 493688f6f1 Allow interactive use with new single-file weight format.
Add section about new weights format to README.md.
Remove model_type_required parameter.
Update error handling for flags.

PiperOrigin-RevId: 715788822
2025-01-15 07:22:33 -08:00
Ray Smith b93231a47d Moved the vit config fields to their own config struct
PiperOrigin-RevId: 715692800
2025-01-15 01:09:49 -08:00
Ray Smith 9d40f0117e Added ability to load/save a complete model file, including tokenizer.
PiperOrigin-RevId: 707914366
2024-12-19 07:59:41 -08:00
The gemma.cpp Authors 5bc356f18f Internal change
PiperOrigin-RevId: 707268913
2024-12-17 15:15:57 -08:00
Daniel Keysers 73766e8ee3 Small updates to the README file.
PiperOrigin-RevId: 707036429
2024-12-17 04:09:55 -08:00
Daniel Keysers 62c70d6715 Rename ModelTraining to PromptWrapping which is a more accurate name.
PiperOrigin-RevId: 705881500
2024-12-13 07:45:59 -08:00
Ray Smith 6254f2e5ca Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT
PiperOrigin-RevId: 705085054
2024-12-11 06:30:28 -08:00
Daniel Keysers aed17396be Make prompt wrapping more consistent and fix duplicated tokens for multi-turn.
Do not echo <end_of_turn> tokens to the user.
Have verbosity=0 only show the dialog.

PiperOrigin-RevId: 705021391
2024-12-11 01:52:00 -08:00
Ray Smith e69bc3bc1c Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future.
Corrected some errors in the TensorIndex.

PiperOrigin-RevId: 705014619
2024-12-11 01:26:35 -08:00
Jan Wassenberg 7b77909427 Fix unhandled switch warning/error
PiperOrigin-RevId: 704828160
2024-12-10 13:32:53 -08:00
Jan Wassenberg 642fc97d51 Internal change
PiperOrigin-RevId: 704692923
2024-12-10 06:58:32 -08:00
Copybara-Service d8135e836f Merge pull request #460 from ericcurtin:common
PiperOrigin-RevId: 704684454
2024-12-10 06:33:37 -08:00
Daniel Keysers 5bbe814a53 Tiny cleanup.
PiperOrigin-RevId: 704636988
2024-12-10 03:34:05 -08:00
Daniel Keysers 331d2ccc02 Add support for 448px resolution to PaliGemma and PaliGemma2.
PiperOrigin-RevId: 704361579
2024-12-09 11:38:10 -08:00