Copybara-Service
bef91a3f03
Merge pull request #529 from ufownl:refactor/wrap_and_tokenize
...
PiperOrigin-RevId: 745174371
2025-04-08 09:22:26 -07:00
Jan Wassenberg
4e6aa36e9b
Minor cleanup: enable 0,0 Extents2D, add SerializedSpan typedef, include fixes
...
PiperOrigin-RevId: 745068776
2025-04-08 03:35:55 -07:00
RangerUFO
cc2e14e654
Improve `GemmaChatTemplate` to handle vision prompt wrapping
2025-03-29 11:31:40 +08:00
RangerUFO
c39295f497
Inline the ctor of `GemmaChatTemplate`
2025-03-29 11:31:40 +08:00
RangerUFO
d1615b56b2
Fix the prompt wrapping of gemma3-1b again
...
It seems that the previous fix was changed back due to a merge error.
2025-03-29 11:31:39 +08:00
RangerUFO
ca4ee2b63f
Refactor `WrapAndTokenize` to work properly with Gemma3
2025-03-29 11:31:39 +08:00
RangerUFO
d42deaa27c
Set the secondary EOS for Gemma2
...
So that we can remove the `<end_of_turn>` filter that was set up
specifically for Gemma2.
2025-03-22 01:32:22 +08:00
RangerUFO
2bad79f110
Fix the EOS checking
...
The secondary eos is usually `<end_of_turn>`, which can appear in the
prompt, so we can only check it not in the prompt.
2025-03-22 01:32:22 +08:00
Phil Culliton
05b1cce9f7
Add support for a secondary EOS token
...
PiperOrigin-RevId: 738898976
2025-03-20 12:28:31 -07:00
Jan Wassenberg
83219e3c68
Add note on attention length and SFP
...
PiperOrigin-RevId: 738698399
2025-03-20 00:39:06 -07:00
RangerUFO
b16ce9a0b4
Fix the prompt wrapping of gemma3-1b
2025-03-18 16:52:38 +08:00
Jan Wassenberg
1b72c22345
Refactor Gemma ctor and improve pool NUMA support
...
Gemma receives a MatMulEnv arg, with comment on lifetime
Split threading into topology so the latter can be used in allocator
Add AllocClasses() for non-POD (ThreadPool)
Support binding pool to NUMA node
Update threading_test with latency measurements
Also update Highway version.
PiperOrigin-RevId: 736904748
2025-03-14 10:19:00 -07:00
Phil Culliton
1b1b63d560
Fix PaliGemma models.
...
PiperOrigin-RevId: 736483021
2025-03-13 06:28:29 -07:00
Phil Culliton
4ab601da10
Internal change.
...
PiperOrigin-RevId: 736015810
2025-03-11 23:20:20 -07:00
Phil Culliton
9d83ff202e
Internal change.
...
PiperOrigin-RevId: 736014152
2025-03-11 23:10:48 -07:00
Jan Wassenberg
2bdf26d81d
Support bf16 output of Matmul
...
Adds Stride to ConstMat, to support decompression of C output for test
matmul_test: add line numbers to output
Also ignore "N is not a multiple of nc" when N==nc
PiperOrigin-RevId: 731096662
2025-02-25 17:53:20 -08:00
Jan Wassenberg
b3b4b9f92f
With new matmul, much larger batch sizes are advantageous, default to 256.
...
Can still override via command line argument.
PiperOrigin-RevId: 730502653
2025-02-24 10:21:58 -08:00
Jan Wassenberg
f9d93e4a42
Matmul rewrite: fp64 sums, hierarchical parallelization, cache-blocking, autotuning
...
Remove empty matmul_unit_test.
Up to 25 TFLOP/s on 2xZen4 for 512,3072,24576.
PiperOrigin-RevId: 729123576
2025-02-20 08:33:46 -08:00
Apoorv Reddy
0e5b59d24d
Implements FusedSoftmaxAndSampleTopK.
...
This computes softmax on the top-K logits, instead of computing softmax first and then getting top-K probs. So we end up avoiding renormalizing too. Additionally, modify softmax to do temperature scaling, if temp != 1.0
PiperOrigin-RevId: 727702149
2025-02-16 21:30:06 -08:00
Copybara-Service
c495b25995
Merge pull request #493 from ufownl:bugfix/compress_weights_le
...
PiperOrigin-RevId: 725585921
2025-02-11 05:10:13 -08:00
Apoorv Reddy
64cf6dfe0a
Using TimingInfo methods and cleaning up args to DecodeStepT
...
PiperOrigin-RevId: 725580125
2025-02-11 04:49:14 -08:00
Apoorv Reddy
780e376023
Add KVCache.DeepCopy() . Will be useful for implementing sampling functionality like beam sampling, parallel sampling, CoT Decoding (à la https://arxiv.org/abs/2402.10200 )
...
PiperOrigin-RevId: 725156316
2025-02-10 04:10:29 -08:00
Apoorv Reddy
9b3e7ea8a2
Factor out DecodeStepT from GenerateT into a separate function.
...
This will be useful for adding sampling functionality like beam decoding, parallel sampling, cot decoding (as described in the [Chain-of-Thought Reasoning Without Prompting paper](https://arxiv.org/abs/2402.10200 ))
PiperOrigin-RevId: 725151530
2025-02-10 03:53:08 -08:00
RangerUFO
3a5a6dbcad
Fix the link error when building `compress_weights` with Clang on macOS
2025-02-09 00:13:25 +08:00
Jan Wassenberg
b18bd781f6
Windows build fixes: struct vs class, unused arg/var, avoid VLA, Deleter arg, casts
...
PiperOrigin-RevId: 724340518
2025-02-07 07:38:55 -08:00
Phil Culliton
7ccc6abe87
Allow conversion, loading and inference with NUQ.
...
PiperOrigin-RevId: 723507890
2025-02-05 07:45:54 -08:00
Daniel Keysers
bcdb0d65bd
Assorted small cleanups.
...
PiperOrigin-RevId: 720548132
2025-01-28 06:09:45 -08:00
Daniel Keysers
e997468496
Apply PositionalEncodingQK always in-place.
...
PiperOrigin-RevId: 718851803
2025-01-23 07:09:30 -08:00
Apoorv Reddy
ce807a31a1
internal change
...
PiperOrigin-RevId: 718824952
2025-01-23 05:31:11 -08:00
Jan Wassenberg
a60b564b88
Infra improvements (2)
...
ops.h: move CreateInvTimescale to allow calling without depending on gemma
Pass around MatMulEnv instead of pools to avoid re-creating the env
profiler.h can now be used outside SIMD code
allocator: add StepBytes and QuantumSteps
rename worker thread with package/cluster in the name
threading: add Visit* to IndexRange
PiperOrigin-RevId: 718766704
2025-01-23 01:55:19 -08:00
Daniel Keysers
f37402da57
Add parameter for base_frequency to CreateInvTimeScale().
...
Extract a few local variables to make code easier to read (hopefully).
PiperOrigin-RevId: 718749053
2025-01-23 00:56:44 -08:00
Phil Culliton
9646edc908
Internal change
...
PiperOrigin-RevId: 717916568
2025-01-21 07:53:49 -08:00
Jan Wassenberg
c4398fc72d
Infra improvements:
...
allocator: support mmap, fixed Bind, add padding
bench_matmul: Add PreventElision
BUILD: add ops_test build target
matmul.h: move ConstMat here; dynamic alloc of MatMulEnv
matmul_test: remove benchmarking
replace fprintf with HWY_WARN
threading.cc: support splitting large clusters (disabled); package_idx->pkg_idx, smaller IndexRangePartition
PiperOrigin-RevId: 717512274
2025-01-20 06:22:49 -08:00
Daniel Keysers
493688f6f1
Allow interactive use with new single-file weight format.
...
Add section about new weights format to README.md.
Remove model_type_required parameter.
Update error handling for flags.
PiperOrigin-RevId: 715788822
2025-01-15 07:22:33 -08:00
Ray Smith
b93231a47d
Moved the vit config fields to their own config struct
...
PiperOrigin-RevId: 715692800
2025-01-15 01:09:49 -08:00
Ray Smith
9d40f0117e
Added ability to load/save a complete model file, including tokenizer.
...
PiperOrigin-RevId: 707914366
2024-12-19 07:59:41 -08:00
Daniel Keysers
62c70d6715
Rename ModelTraining to PromptWrapping which is a more accurate name.
...
PiperOrigin-RevId: 705881500
2024-12-13 07:45:59 -08:00
Ray Smith
6254f2e5ca
Removed duplicated tensor sizes from weights.h by changing the constructor used for MatPtrT
...
PiperOrigin-RevId: 705085054
2024-12-11 06:30:28 -08:00
Daniel Keysers
aed17396be
Make prompt wrapping more consistent and fix duplicated tokens for multi-turn.
...
Do not echo <end_of_turn> tokens to the user.
Have verbosity=0 only show the dialog.
PiperOrigin-RevId: 705021391
2024-12-11 01:52:00 -08:00
Ray Smith
e69bc3bc1c
Added the TensorInfo arg to the compressor so the shape and scale can be output correctly to the file in future.
...
Corrected some errors in the TensorIndex.
PiperOrigin-RevId: 705014619
2024-12-11 01:26:35 -08:00
Copybara-Service
d8135e836f
Merge pull request #460 from ericcurtin:common
...
PiperOrigin-RevId: 704684454
2024-12-10 06:33:37 -08:00
Daniel Keysers
331d2ccc02
Add support for 448px resolution to PaliGemma and PaliGemma2.
...
PiperOrigin-RevId: 704361579
2024-12-09 11:38:10 -08:00
Eric Curtin
a971088ac2
Refactor `gemma/common.cc` to improve readability and safety
...
Use `std::size` for array size calculations. Replace C-style
string manipulations with `std::string` methods. Simplify
`std::transform` usage for case conversion.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2024-12-09 16:36:25 +00:00
Phil Culliton
9dfe2a76be
Internal change
...
PiperOrigin-RevId: 702961613
2024-12-04 20:41:47 -08:00
Ray Smith
3d1625d8c5
Improved consistency of compressor API, and added a universal method with a target type arg.
...
Moved configs pybind up to root level.
PiperOrigin-RevId: 698743417
2024-11-21 05:27:40 -08:00
Ray Smith
73640d2521
Added tensor_index as a single source of truth on tensor shapes/sources and transformations
...
PiperOrigin-RevId: 697903886
2024-11-19 00:25:39 -08:00
Ray Smith
7d685a267f
Added pybind for configs.
...
Added ability to test configs for equality.
PiperOrigin-RevId: 697572671
2024-11-18 04:03:51 -08:00
Daniel Keysers
719699f132
Make top_k a runtime argument (instead of a model argument).
...
PiperOrigin-RevId: 696170691
2024-11-13 09:48:59 -08:00
Daniel Keysers
e54d9cbddd
Fix Griffin model:
...
- use HalfRope position encodings
- zero-initialize the caches for each Generate at position 0
The lack of the latter made the tests in gemma_test dependent on each other.
PiperOrigin-RevId: 694509054
2024-11-08 08:30:53 -08:00
Jan Wassenberg
868b01601f
Simpler MatMul interface, vocab types, Tristate for use_spinning
...
Add Extents2D, Range2D vocab types
Matmul uses ConstMat for inputs and RowPtr for output
Move RowVectorBatch to basics.h
Separate threading.cc
Fix topology string: report cores not LPs, and #HT
Move QStride/IsMHA into LayerConfig
ImageTokens does not require make_unique.
matmul_test: no longer require template args
PiperOrigin-RevId: 692963605
2024-11-04 07:48:29 -08:00