Definition of array size is moved to the constructor.
Allocation is separate and parallelized.
All users of weights_raw.h migrated to CompressedWeights and weights_raw.h deleted.
Replaced all previous ForEachTensor functions with a single unified function.
PiperOrigin-RevId: 684451604
Use in dot_test
app.h: add new flags and rename num_threads to max_threads
matmul: Parallelize MatMulSlow and enable spinning, more large/fewer medium test cases
PiperOrigin-RevId: 683216386
See https://arxiv.org/abs/2407.07726 for a description of the model.
Because PaliGemma operates as a prefix-LM on the image+prompt, add support for that.
PiperOrigin-RevId: 677841119
Compression:
* Implement {any packed} x {bf16, f32} 'Load2' and DecompressAndZeroPad
* New compression test for all packed formats, add to GEMMA_TEST_FILES, remove from sfp/nuq_test
* Decompress->DecompressAndZeroPad, use PackedSpan for args with bounds checking
* NUQ: support arbitrary-length enc/dec
* New compression/shared, remove sfp.h and nuq.h
* Move Store2 into Traits and provide Compress2 wrapper
* Remove unused Decompress()-with-pool overload
* Simplify CompressedArrayLen, rename to CompressedArrayElements
* Remove unused DistortionStats b_l1_
Misc:
* Add compensated and Kahan dot, support any length
* Use same Dot function everywhere
* Move exact arithmetic functions into fp_arith
* use FloatPtr and MatPtr typedefs in tests; less stack usage
* Rename args to packed/raw
* Remove Traits::Name, instead TypeName<T>()
* Move kMaxSFP and kClusters/kGroupSize into Sfp/NuqStream
PiperOrigin-RevId: 672868468
Supports converting all weight/activation formats to native MulT (bf16/f32)
Also:
- ConstMat/MutableMat for const correctness
- Move RowVectorBatch to allocator.h so it can be used from Matmul
- Add matmul.h so MatMulEnv can be used from Activations
- Remove kMaxThreads, detect from PerClusterPools
- Build fix: -inl.h files must be textual_hdrs, and highway.h should precede -inl.h
```
zen4 new
64, 24576, 3072, add=0, MatTA=bf16, MatTB=sfp: 616.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=bf16, MatTB=sfp: 460.7 GFLOPS.
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 598.6 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 435.6 GFLOPS.
zen4 old
64, 24576, 3072, add=0, MatTA=f32, MatTB=sfp: 257.5 GFLOPS.
64, 3072, 24576, add=0, MatTA=f32, MatTB=sfp: 231.9 GFLOPS.
```
PiperOrigin-RevId: 663729812
This changes the command line flags, but the default value retains the previous behavior.
Also add a CreateGemma helper to enable extra args without interface changes.
PiperOrigin-RevId: 641266411
With this addition the ComputeCrossEntropy function can be moved
to its own library, because now we can compute it using only the
public API functions from gemma.h
Split common and weights into separate lib
Remove common-inl (does not have to be SIMD code), activations.cc
Centralize switch(Model) to avoid duplication
Move CompressWeightsT to compress_weights.cc
Move LoadWeights to weights.cc
PiperOrigin-RevId: 640869202
This is still in progress / experimental, currently it is only
implemented for normal gemma MQA attention layers, and no
parallelism is added yet for backward pass.
Since we need to remember all activations from all layers, the
forward pass was also reimplemented with a new activation data
structure.
Move Path into io.h and use for opening files.
Removes dependency of gemma_lib on args.
Separate Windows codepath instead of emulating POSIX functions.
Plus lint fixes.
PiperOrigin-RevId: 626279004
Also add a script to help running sanitizer builds, and do some cleanup.
Co-authored-by: Andrey Mikhaylov <amik@google.com>
Co-authored-by: Eugene Kliuchnikov <eustas@google.com>
Co-authored-by: Sami Boukortt <sboukortt@google.com>
Co-authored-by: Zoltan Szabadka <szabadka@google.com>
--
19694e1f2e by Silvio Traversaro <silvio@traversaro.it>:
Do not pass explicitly -O2 flag to compiler in Release build
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/gemma.cpp/pull/3 from traversaro:patch-1 19694e1f2e
PiperOrigin-RevId: 610096914
It's not possible to build `gemma.cpp` with the standard MSVC front-end
as it doesn't support arrays more than `0x7ffffffff` bytes (see Compiler Error C2148),
however this isn't a problem with the optional Visual Studio Clang/LLVM frontend.
This can be specified using the `-T` flag when running CMake:
```
$ cmake -B build -T ClangCL
$ cmake --build build --config Release
```
Windows doesn't provide `pread`/`pwrite` so this must be emulated using
the `ReadFile`/`WriteFile` Win32 APIs.
`_CRT_SECURE_NO_WARNINGS` is defined to prevent a large number of warnings
about using "depricated" function names (e.g. `close` instead of `_close`).
`NOMINMAX` is defined to prevent the `min`/`max` macros from `windows.h`
from conflicting with expressions like `std::min`. Generally libraries should
avoid including `windows.h` in their public headers or define `WIN32_LEAN_AND_MEAN`
before including the `windows.h` header, but this unfortunately isn't always the case.