Luca Versari
5862d1f995
Add a benchmark and additional tests.
...
Also add a script to help running sanitizer builds, and do some cleanup.
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Sami Boukortt <sboukortt@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 12:54:52 +02:00
Luca Versari
4c23932289
Improve weight handling.
...
- Allow scaling of SFP weights
- Allow using uncompressed weights
- Do not try to compress weights in the main model calls
- Reduce code duplication in weight handling with some macros
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Thomas Fischbacher <tfish@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 11:08:47 +02:00
Jan Wassenberg
7122afed5a
Add note on weight update and improve error message
...
PiperOrigin-RevId: 621849989
2024-04-04 07:17:27 -07:00
Copybara-Service
08948f13ac
Merge pull request #127 from szabadka:gemma3
...
PiperOrigin-RevId: 621815677
2024-04-04 04:32:03 -07:00
Jan Wassenberg
44e6274e99
1.07x speedup: merge MQA parallel sections as suggested by @veluca93
...
PiperOrigin-RevId: 621772392
2024-04-04 01:12:53 -07:00
Zoltan Szabadka
71ead04afb
Fix off-by-one errors in generation code and token streaming callback.
...
In the generation code we were feeding the last token of the prompt
twice through the transformer. The new version fixes that and also
works in the case where Prefill is completely disabled.
2024-04-04 07:56:21 +00:00
Zoltan Szabadka
b670d43e4f
Add standalone tool to compress weights.
...
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
2024-04-03 14:54:08 +00:00
RangerUFO
1c03d7446d
Fix compilation error when `HWY_COMPILER_GCC_ACTUAL < 1300`
2024-03-28 14:54:37 +08:00
Jan Wassenberg
bb767d788d
Bounds-checks for large prompts. Refs #99
...
Also remove init placeholder and move Sqrt to ops.h.
PiperOrigin-RevId: 619529202
2024-03-27 07:49:46 -07:00
Copybara-Service
fcf5c1af88
Merge pull request #114 from ufownl:experimental
...
PiperOrigin-RevId: 618148701
2024-03-22 05:36:07 -07:00
RangerUFO
90b0e9fd7a
Refactor the implementation of `Attention`
2024-03-21 14:40:56 +08:00
Jan Wassenberg
ba86c8d590
Remove obsolete copybara tags, faster bazel builds (debug)
...
PiperOrigin-RevId: 617576799
2024-03-21 04:19:02 +01:00
Jan Wassenberg
f8baac80f9
Fix msan error, uninitialized model_training
...
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.
Also fix includes.
PiperOrigin-RevId: 617386447
2024-03-21 04:18:55 +01:00
Eric Ye
52940d435f
Connect "--weights" parameter to Gemma
...
PiperOrigin-RevId: 617323257
2024-03-21 04:18:48 +01:00
Eric Ye
89be4c3de8
No public description
...
PiperOrigin-RevId: 617315030
2024-03-21 04:18:36 +01:00
Jan Wassenberg
30b8a3c1ac
Fix build for RPi, missing hn::. Refs #112 , thanks long568
...
PiperOrigin-RevId: 617704418
2024-03-20 20:07:49 -07:00
Jan Wassenberg
06cea2bcdb
Remove obsolete copybara tags, faster bazel builds (debug)
...
PiperOrigin-RevId: 617576799
2024-03-20 23:37:39 +01:00
Jan Wassenberg
edaafe335f
Fix msan error, uninitialized model_training
...
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.
Also fix includes.
PiperOrigin-RevId: 617386447
2024-03-20 23:37:32 +01:00
Eric Ye
e2a04b79ed
Connect "--weights" parameter to Gemma
...
PiperOrigin-RevId: 617323257
2024-03-20 23:37:25 +01:00
Eric Ye
ffd02c59ad
No public description
...
PiperOrigin-RevId: 617315030
2024-03-20 23:37:12 +01:00
Jan Wassenberg
7d5364bb80
Remove obsolete copybara tags, faster bazel builds (debug)
...
PiperOrigin-RevId: 617576799
2024-03-20 11:31:59 -07:00
RangerUFO
8fc6959950
Move conditional branch out of `pos2` loop
2024-03-20 23:50:14 +08:00
RangerUFO
c75d2eb635
Add the missing `HWY_ATTR` of `ProjKV`
2024-03-20 23:21:43 +08:00
RangerUFO
ce32f4db81
Streamline the implementation
2024-03-20 22:39:31 +08:00
Jan Wassenberg
11d9c51473
Fix msan error, uninitialized model_training
...
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.
Also fix includes.
PiperOrigin-RevId: 617386447
2024-03-20 12:13:13 +01:00
Eric Ye
6865819bb7
Connect "--weights" parameter to Gemma
...
PiperOrigin-RevId: 617323257
2024-03-20 12:13:06 +01:00
Eric Ye
fdc3812446
No public description
...
PiperOrigin-RevId: 617315030
2024-03-20 12:12:54 +01:00
RangerUFO
6923aec853
Add MQA support
2024-03-20 18:17:24 +08:00
Jan Wassenberg
5e0cafbdc2
Fix msan error, uninitialized model_training
...
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.
Also fix includes.
PiperOrigin-RevId: 617386447
2024-03-19 21:12:35 -07:00
Eric Ye
fdb1091b9c
Connect "--weights" parameter to Gemma
...
PiperOrigin-RevId: 617323257
2024-03-19 16:08:26 -07:00
Copybara-Service
a2ef389897
Merge pull request #98 from zeerd:patch-2
...
PiperOrigin-RevId: 615769065
2024-03-14 07:21:23 -07:00
Charles Chan
46c1aca304
Add missing log that point to a failed Generation
2024-03-14 10:03:25 +08:00
Jan Wassenberg
5fa2eb1a86
Use bf16-rounded sqrt for scaling embeddings to match Gemma
...
Thanks Daniel & Michael Han for pointing this out.
https://unsloth.ai/blog/gemma-bugs
PiperOrigin-RevId: 615250003
2024-03-12 19:15:13 -07:00
Copybara-Service
ccd055e06b
Merge pull request #82 from google:examples
...
PiperOrigin-RevId: 615066980
2024-03-12 09:24:24 -07:00
austinvhuang
5d323c00fe
fix tokenizer scope
2024-03-10 13:23:16 -04:00
austinvhuang
0fc80fad05
libgemma refactor - review changes
2024-03-10 12:55:08 -04:00
austinvhuang
cc5c24c4f8
remove app.h dependency + fix bazel build
2024-03-08 18:06:43 -05:00
austinvhuang
dfd2fdc1dd
Decouple gemma constructor from loader args, update hello_world example, add convenience version of constructor (no uncompressed weights)
2024-03-08 17:26:03 -05:00
austinvhuang
42e53e2da8
[WIP] simplify hello world example, add convenience function. TODO: update git hash in CMakeLists.txt of hello world after push
2024-03-08 14:56:22 -05:00
austinvhuang
b67e28d1a0
[WIP] remove args from GetWeights, GetCompressedWeights
2024-03-08 00:00:11 -05:00
RangerUFO
170a9b4690
Make `CreateKVCache` a free function rather than a method
2024-03-07 15:52:55 +08:00
RangerUFO
b841612e8c
Separate KV cache from GemmaImpl
2024-03-07 15:47:31 +08:00
austinvhuang
e781007836
[WIP] Remove InferenceArgs from hello_world example, fix ordering of LoaderArgs validation, revert ReplGemma EOT token behavior
2024-03-06 23:21:13 -05:00
austinvhuang
7042316013
[WIP] update GemmaInterface, Gemma, and Generate input parameter specs to remove InferenceArgs. TODO: update hello_world example after git commit hash is available for fetching
2024-03-06 22:22:59 -05:00
austinvhuang
0f6a4b49d5
[WIP] quality tweaks - for constants, defer float cast and use double for intermediate computations, add `model` to EOT token
2024-03-06 15:34:11 -05:00
austinvhuang
5b9d8a9936
[WIP] dev/examples branch merge
2024-03-06 15:10:48 -05:00
austinvhuang
10f7a086aa
[WIP] decouple GemmaImpl from CLI args
2024-03-06 15:06:41 -05:00
Copybara-Service
cd7468199c
Merge pull request #65 from enum-class:narrowing-issues
...
PiperOrigin-RevId: 612279564
2024-03-03 18:51:59 -08:00
Paul Chang
ae7901c3f4
Minor style fix
...
Remove some obsolete TODOs.
PiperOrigin-RevId: 611571224
2024-02-29 13:08:26 -08:00
Jan Wassenberg
272f17ddb3
Warning fixes: unused member, cast, unused function
...
PiperOrigin-RevId: 611074887
2024-02-28 05:54:22 -08:00