Commit Graph

57 Commits

Author SHA1 Message Date
Luca Versari 4c23932289 Improve weight handling.
- Allow scaling of SFP weights
- Allow using uncompressed weights
- Do not try to compress weights in the main model calls
- Reduce code duplication in weight handling with some macros

Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Thomas Fischbacher <tfish@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
2024-04-06 11:08:47 +02:00
Jan Wassenberg 7122afed5a Add note on weight update and improve error message
PiperOrigin-RevId: 621849989
2024-04-04 07:17:27 -07:00
Copybara-Service 08948f13ac Merge pull request #127 from szabadka:gemma3
PiperOrigin-RevId: 621815677
2024-04-04 04:32:03 -07:00
Jan Wassenberg 44e6274e99 1.07x speedup: merge MQA parallel sections as suggested by @veluca93
PiperOrigin-RevId: 621772392
2024-04-04 01:12:53 -07:00
Zoltan Szabadka 71ead04afb Fix off-by-one errors in generation code and token streaming callback.
In the generation code we were feeding the last token of the prompt
twice through the transformer. The new version fixes that and also
works in the case where Prefill is completely disabled.
2024-04-04 07:56:21 +00:00
Zoltan Szabadka b670d43e4f Add standalone tool to compress weights.
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
2024-04-03 14:54:08 +00:00
RangerUFO 1c03d7446d Fix compilation error when `HWY_COMPILER_GCC_ACTUAL < 1300` 2024-03-28 14:54:37 +08:00
Jan Wassenberg bb767d788d Bounds-checks for large prompts. Refs #99
Also remove init placeholder and move Sqrt to ops.h.

PiperOrigin-RevId: 619529202
2024-03-27 07:49:46 -07:00
Copybara-Service fcf5c1af88 Merge pull request #114 from ufownl:experimental
PiperOrigin-RevId: 618148701
2024-03-22 05:36:07 -07:00
RangerUFO 90b0e9fd7a Refactor the implementation of `Attention` 2024-03-21 14:40:56 +08:00
Jan Wassenberg ba86c8d590 Remove obsolete copybara tags, faster bazel builds (debug)
PiperOrigin-RevId: 617576799
2024-03-21 04:19:02 +01:00
Jan Wassenberg f8baac80f9 Fix msan error, uninitialized model_training
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.

Also fix includes.

PiperOrigin-RevId: 617386447
2024-03-21 04:18:55 +01:00
Eric Ye 52940d435f Connect "--weights" parameter to Gemma
PiperOrigin-RevId: 617323257
2024-03-21 04:18:48 +01:00
Eric Ye 89be4c3de8 No public description
PiperOrigin-RevId: 617315030
2024-03-21 04:18:36 +01:00
Jan Wassenberg 30b8a3c1ac Fix build for RPi, missing hn::. Refs #112, thanks long568
PiperOrigin-RevId: 617704418
2024-03-20 20:07:49 -07:00
Jan Wassenberg 06cea2bcdb Remove obsolete copybara tags, faster bazel builds (debug)
PiperOrigin-RevId: 617576799
2024-03-20 23:37:39 +01:00
Jan Wassenberg edaafe335f Fix msan error, uninitialized model_training
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.

Also fix includes.

PiperOrigin-RevId: 617386447
2024-03-20 23:37:32 +01:00
Eric Ye e2a04b79ed Connect "--weights" parameter to Gemma
PiperOrigin-RevId: 617323257
2024-03-20 23:37:25 +01:00
Eric Ye ffd02c59ad No public description
PiperOrigin-RevId: 617315030
2024-03-20 23:37:12 +01:00
Jan Wassenberg 7d5364bb80 Remove obsolete copybara tags, faster bazel builds (debug)
PiperOrigin-RevId: 617576799
2024-03-20 11:31:59 -07:00
RangerUFO 8fc6959950 Move conditional branch out of `pos2` loop 2024-03-20 23:50:14 +08:00
RangerUFO c75d2eb635 Add the missing `HWY_ATTR` of `ProjKV` 2024-03-20 23:21:43 +08:00
RangerUFO ce32f4db81 Streamline the implementation 2024-03-20 22:39:31 +08:00
Jan Wassenberg 11d9c51473 Fix msan error, uninitialized model_training
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.

Also fix includes.

PiperOrigin-RevId: 617386447
2024-03-20 12:13:13 +01:00
Eric Ye 6865819bb7 Connect "--weights" parameter to Gemma
PiperOrigin-RevId: 617323257
2024-03-20 12:13:06 +01:00
Eric Ye fdc3812446 No public description
PiperOrigin-RevId: 617315030
2024-03-20 12:12:54 +01:00
RangerUFO 6923aec853 Add MQA support 2024-03-20 18:17:24 +08:00
Jan Wassenberg 5e0cafbdc2 Fix msan error, uninitialized model_training
This arose during the unpacking of LoaderArgs into individual ctor args. Probably better to pass LoaderArgs in, and have only a single ctor to reduce confusion.

Also fix includes.

PiperOrigin-RevId: 617386447
2024-03-19 21:12:35 -07:00
Eric Ye fdb1091b9c Connect "--weights" parameter to Gemma
PiperOrigin-RevId: 617323257
2024-03-19 16:08:26 -07:00
Copybara-Service a2ef389897 Merge pull request #98 from zeerd:patch-2
PiperOrigin-RevId: 615769065
2024-03-14 07:21:23 -07:00
Charles Chan 46c1aca304
Add missing log that point to a failed Generation 2024-03-14 10:03:25 +08:00
Jan Wassenberg 5fa2eb1a86 Use bf16-rounded sqrt for scaling embeddings to match Gemma
Thanks Daniel & Michael Han for pointing this out.
https://unsloth.ai/blog/gemma-bugs

PiperOrigin-RevId: 615250003
2024-03-12 19:15:13 -07:00
Copybara-Service ccd055e06b Merge pull request #82 from google:examples
PiperOrigin-RevId: 615066980
2024-03-12 09:24:24 -07:00
austinvhuang 5d323c00fe fix tokenizer scope 2024-03-10 13:23:16 -04:00
austinvhuang 0fc80fad05 libgemma refactor - review changes 2024-03-10 12:55:08 -04:00
austinvhuang cc5c24c4f8 remove app.h dependency + fix bazel build 2024-03-08 18:06:43 -05:00
austinvhuang dfd2fdc1dd Decouple gemma constructor from loader args, update hello_world example, add convenience version of constructor (no uncompressed weights) 2024-03-08 17:26:03 -05:00
austinvhuang 42e53e2da8 [WIP] simplify hello world example, add convenience function. TODO: update git hash in CMakeLists.txt of hello world after push 2024-03-08 14:56:22 -05:00
austinvhuang b67e28d1a0 [WIP] remove args from GetWeights, GetCompressedWeights 2024-03-08 00:00:11 -05:00
RangerUFO 170a9b4690 Make `CreateKVCache` a free function rather than a method 2024-03-07 15:52:55 +08:00
RangerUFO b841612e8c Separate KV cache from GemmaImpl 2024-03-07 15:47:31 +08:00
austinvhuang e781007836 [WIP] Remove InferenceArgs from hello_world example, fix ordering of LoaderArgs validation, revert ReplGemma EOT token behavior 2024-03-06 23:21:13 -05:00
austinvhuang 7042316013 [WIP] update GemmaInterface, Gemma, and Generate input parameter specs to remove InferenceArgs. TODO: update hello_world example after git commit hash is available for fetching 2024-03-06 22:22:59 -05:00
austinvhuang 0f6a4b49d5 [WIP] quality tweaks - for constants, defer float cast and use double for intermediate computations, add `model` to EOT token 2024-03-06 15:34:11 -05:00
austinvhuang 5b9d8a9936 [WIP] dev/examples branch merge 2024-03-06 15:10:48 -05:00
austinvhuang 10f7a086aa [WIP] decouple GemmaImpl from CLI args 2024-03-06 15:06:41 -05:00
Copybara-Service cd7468199c Merge pull request #65 from enum-class:narrowing-issues
PiperOrigin-RevId: 612279564
2024-03-03 18:51:59 -08:00
Paul Chang ae7901c3f4 Minor style fix
Remove some obsolete TODOs.

PiperOrigin-RevId: 611571224
2024-02-29 13:08:26 -08:00
Jan Wassenberg 272f17ddb3 Warning fixes: unused member, cast, unused function
PiperOrigin-RevId: 611074887
2024-02-28 05:54:22 -08:00
enum-class 06dd013397 Add clang-tidy, fix narrowing issues, fix constness 2024-02-28 20:04:09 +08:00