Commit Graph

126 Commits

Author SHA1 Message Date
Jared Van Bortel 1829f1d7be Merge commit '4760e7cc0b68570d58f55e8dda469805d1759d0d~' into nomic-vulkan 2023-11-23 17:22:08 -05:00
Jared Van Bortel fe26e6adff Merge commit 'e16b9fa4baa8a09c6619b116159830e898050942' into nomic-vulkan 2023-11-23 17:22:04 -05:00
Jared Van Bortel 6474fc879a vulkan : handle ggml_scale for n%8 != 0
ref ggerganov/llama.cpp#3754
2023-11-23 17:22:00 -05:00
Jared Van Bortel 2a41ba7258 Merge commit '469c9addef75893e6be12edda852d12e840bf064' into nomic-vulkan 2023-11-23 17:22:00 -05:00
Jared Van Bortel f194e1b6a6 Merge commit 'fcca0a700487999d52a525c96d6661e9f6a8703a' into nomic-vulkan 2023-11-23 17:21:59 -05:00
Jared Van Bortel 84f7fc4553 vulkan : rope n_past is now KQ_pos, f16 rope kernel 2023-11-23 17:18:42 -05:00
Eve c41ea36eaa
cmake : MSVC instruction detection (fixed up #809) (#3923)
* Add detection code for avx

* Only check hardware when option is ON

* Modify per code review sugguestions

* Build locally will detect CPU

* Fixes CMake style to use lowercase like everywhere else

* cleanup

* fix merge

* linux/gcc version for testing

* msvc combines avx2 and fma into /arch:AVX2 so check for both

* cleanup

* msvc only version

* style

* Update FindSIMD.cmake

---------

Co-authored-by: Howard Su <howard0su@gmail.com>
Co-authored-by: Jeremy Dunn <jeremydunn123@gmail.com>
2023-11-05 10:03:09 +02:00
cebtenzzre 21841d3163 kompute : enable kp_logger and make it static (#8) 2023-11-03 17:22:22 -04:00
Aaron Miller cc05a602d6 use mat*vec shaders for mat*mat
I wrote the mat*mat shaders from scratch so I understand them better but
they are currently not faster than just multiply-invoking the mat*vec
shaders, by a significant degree - so, except for f32 which needed a new
shader, revert to the m*v ones here.
2023-11-03 17:22:22 -04:00
Aaron Miller cd0257ed0d q4_1 mat*mat 2023-11-03 17:22:22 -04:00
Aaron Miller b78a94bc6d q6k mm works 2023-11-03 17:22:22 -04:00
Aaron Miller f0cd38b9ad add mat*mat ops 2023-11-03 17:22:22 -04:00
Aaron Miller ff4212d20f q8 mat*vec 2023-11-03 17:22:21 -04:00
Adam Treat 6ac39752bf Fixup the upstream CMakelists.txt so we can build just llama.cpp with our branch. 2023-11-03 17:22:21 -04:00
Adam Treat 77135a3bf5 Add a common boilerplate code via include and elim copy pasta 2023-11-03 17:22:21 -04:00
slaren 21958bb393
cmake : disable LLAMA_NATIVE by default (#3906) 2023-11-02 14:10:33 +02:00
cebtenzzre b12fa0d1c1
build : link against build info instead of compiling against it (#3879)
* cmake : fix build when .git does not exist

* cmake : simplify BUILD_INFO target

* cmake : add missing dependencies on BUILD_INFO

* build : link against build info instead of compiling against it

* zig : make build info a .cpp source instead of a header

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>

* cmake : revert change to CMP0115

---------

Co-authored-by: Matheus C. França <matheus-catarino@hotmail.com>
2023-11-02 08:50:16 +02:00
Georgi Gerganov d69d777c02
ggml : quantization refactoring (#3833)
* ggml : factor all quantization code in ggml-quants

ggml-ci

* ggml-quants : fix Zig and Swift builds + quantize tool

ggml-ci

* quantize : --pure option for disabling k-quant mixtures

---------

Co-authored-by: cebtenzzre <cebtenzzre@gmail.com>
2023-10-29 18:32:28 +02:00
Georgi Gerganov 2f9ec7e271
cuda : improve text-generation and batched decoding performance (#3776)
* cuda : prints wip

* cuda : new cublas gemm branch for multi-batch quantized src0

* cuda : add F32 sgemm branch

* cuda : fine-tune >= VOLTA params + use MMQ only for small batches

* cuda : remove duplicated cuBLAS GEMM code

* cuda : add CUDA_USE_TENSOR_CORES and GGML_CUDA_FORCE_MMQ macros

* build : add compile option to force use of MMQ kernels
2023-10-27 17:01:23 +03:00
Georgi Gerganov 2b4ea35e56
cuda : add batched cuBLAS GEMM for faster attention (#3749)
* cmake : add helper for faster CUDA builds

* batched : add NGL arg

* ggml : skip nops in compute_forward

* cuda : minor indentation

* cuda : batched cuBLAS GEMMs for src0 F16 and src1 F32 (attention ops)

* Apply suggestions from code review

These changes plus:

```c++
#define cublasGemmBatchedEx hipblasGemmBatchedEx
```

are needed to compile with ROCM. I haven't done performance testing, but it seems to work.

I couldn't figure out how to propose a change for lines outside what the pull changed, also this is the first time trying to create a multi-part review so please forgive me if I mess something up.

* cuda : add ROCm / hipBLAS cublasGemmBatchedEx define

* cuda : add cublasGemmStridedBatchedEx for non-broadcasted cases

* cuda : reduce mallocs in cublasGemmBatchedEx branch

* cuda : add TODO for calling cublas from kernel + using mem pool

---------

Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
2023-10-24 16:48:37 +03:00
Georgi Gerganov d28e572c02
cmake : fix add_compile_options on macOS 2023-10-12 14:31:05 +03:00
Georgi Gerganov db3abcc114
sync : ggml (ggml-backend) (#3548)
* sync : ggml (ggml-backend)

ggml-ci

* zig : add ggml-backend to the build
2023-10-08 20:19:14 +03:00
niansa ba15dfd0be Nomic vulkan backend licensed under the Software for Open Models License (SOM), version 1.0. 2023-10-05 13:39:18 -04:00
Eve 017efe899d
cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (#3273)
* fix LLAMA_NATIVE

* syntax

* alternate implementation

* my eyes must be getting bad...

* set cmake LLAMA_NATIVE=ON by default

* march=native doesn't work for ios/tvos, so disable for those targets. also see what happens if we use it on msvc

* revert 8283237 and only allow LLAMA_NATIVE on x86 like the Makefile

* remove -DLLAMA_MPI=ON

---------

Co-authored-by: netrunnereve <netrunnereve@users.noreply.github.com>
2023-10-03 19:53:15 +03:00
cebtenzzre e78f0b0d05
cmake : increase minimum version for add_link_options (#3444) 2023-10-02 22:38:43 +03:00
cebtenzzre 9476b01226
cmake : make CUDA flags more similar to the Makefile (#3420)
* cmake : fix misuse of cxx_flags

* cmake : make CUDA flags more similar to the Makefile

* cmake : fix MSVC build
2023-10-02 16:16:50 +03:00
bandoti 095231dfd3
cmake : fix transient definitions in find pkg (#3411) 2023-10-02 12:51:49 +03:00
Cebtenzzre bc39553c90
build : enable more non-default compiler warnings (#3200) 2023-09-28 17:41:44 -04:00
Jag Chadha 527e57cfd8
build : add ACCELERATE_NEW_LAPACK to fix warning on macOS Sonoma (#3342) 2023-09-27 18:34:32 +03:00
DAN™ 99115f3fa6
cmake : fix build-info.h on MSVC (#3309) 2023-09-25 18:45:33 -04:00
Johannes Gäßler 111163e246
CUDA: enable peer access between devices (#2470) 2023-09-17 16:37:53 +02:00
Cebtenzzre 3aefaab9e5
check C++ code with -Wmissing-declarations (#3184) 2023-09-15 15:38:27 -04:00
Engininja2 7e50d34be6
cmake : fix building shared libs for clang (rocm) on windows (#3176) 2023-09-15 15:24:30 +03:00
Andrei 76164fe2e6
cmake : fix llama.h location when built outside of root directory (#3179) 2023-09-15 11:07:40 +03:00
Andrei 769266a543
cmake : compile ggml-rocm with -fpic when building shared library (#3158) 2023-09-14 20:38:16 +03:00
bandoti 990a5e226a
cmake : add relocatable Llama package (#2960)
* Keep static libs and headers with install

* Add logic to generate Config package

* Use proper build info

* Add llama as import library

* Prefix target with package name

* Add example project using CMake package

* Update README

* Update README

* Remove trailing whitespace
2023-09-14 20:04:40 +03:00
Tristan Ross 1b6c650d16
cmake : add a compiler flag check for FP16 format (#3086) 2023-09-13 16:08:52 +03:00
Johannes Gäßler 0a5eebb45d
CUDA: mul_mat_q RDNA2 tunings (#2910)
* CUDA: mul_mat_q RDNA2 tunings

* Update ggml-cuda.cu

Co-authored-by: Henri Vasserman <henv@hot.ee>

---------

Co-authored-by: Henri Vasserman <henv@hot.ee>
2023-09-13 11:20:24 +02:00
Eric Sommerlade b52b29ab9d
arm64 support for windows (#3007)
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com>
2023-09-12 21:54:20 -04:00
Jhen-Jie Hong 1b0d09259e
cmake : support build for iOS/tvOS (#3116)
* cmake : support build for iOS/tvOS

* ci : add iOS/tvOS build into macOS-latest-cmake

* ci : split ios/tvos jobs
2023-09-11 19:49:06 +08:00
Georgi Gerganov b3e9852e47
sync : ggml (CUDA GLM RoPE + POSIX) (#3082)
ggml-ci
2023-09-08 17:58:07 +03:00
Przemysław Pawełczyk cb6c44c5e0
build : do not use _GNU_SOURCE gratuitously (#2035)
* Do not use _GNU_SOURCE gratuitously.

What is needed to build llama.cpp and examples is availability of
stuff defined in The Open Group Base Specifications Issue 6
(https://pubs.opengroup.org/onlinepubs/009695399/) known also as
Single Unix Specification v3 (SUSv3) or POSIX.1-2001 + XSI extensions,
plus some stuff from BSD that is not specified in POSIX.1.

Well, that was true until NUMA support was added recently,
so enable GNU libc extensions for Linux builds to cover that.

Not having feature test macros in source code gives greater flexibility
to those wanting to reuse it in 3rd party app, as they can build it with
FTMs set by Makefile here or other FTMs depending on their needs.

It builds without issues in Alpine (musl libc), Ubuntu (glibc), MSYS2.

* make : enable Darwin extensions for macOS to expose RLIMIT_MEMLOCK

* make : enable BSD extensions for DragonFlyBSD to expose RLIMIT_MEMLOCK

* make : use BSD-specific FTMs to enable alloca on BSDs

* make : fix OpenBSD build by exposing newer POSIX definitions

* cmake : follow recent FTM improvements from Makefile
2023-09-08 15:09:21 +03:00
Kunshang Ji 7f412dab9c
enable CPU HBM (#2603)
* add cpu hbm support

* add memalign 0 byte check

* Update ggml.c

* Update llama.cpp

* ggml : allow ggml_init with 0 size

* retrigger ci

* fix code style

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-09-08 03:46:56 +02:00
Cebtenzzre 00d62adb79
fix some warnings from gcc and clang-tidy (#3038)
Co-authored-by: xaedes <xaedes@gmail.com>
2023-09-07 13:22:29 -04:00
Cebtenzzre 9912b9efc8
build : add LLAMA_METAL_NDEBUG flag (#3033) 2023-09-05 18:21:10 -04:00
Georgi Gerganov e36ecdccc8
build : on Mac OS enable Metal by default (#2901)
* build : on Mac OS enable Metal by default

* make : try to fix build on Linux

* make : move targets back to the top

* make : fix target clean

* llama : enable GPU inference by default with Metal

* llama : fix vocab_only logic when GPU is enabled

* common : better `n_gpu_layers` assignment

* readme : update Metal instructions

* make : fix merge conflict remnants

* gitignore : metal
2023-09-04 22:26:24 +03:00
Cebtenzzre ef15649972
build : fix most gcc and clang warnings (#2861)
* fix most gcc and clang warnings

* baby-llama : remove commented opt_params_adam

* fix some MinGW warnings

* fix more MinGW warnings
2023-09-01 16:34:50 +03:00
Cebtenzzre 849408957c
tests : add a C compliance test (#2848)
* tests : add a C compliance test

* make : build C compliance test by default

* make : fix clean and make sure C test fails on clang

* make : move -Werror=implicit-int to CFLAGS
2023-08-30 09:20:26 +03:00
Georgi Gerganov 3a007648f2
metal : add option to disable debug logs (close #2764) 2023-08-29 11:33:46 +03:00
Henri Vasserman 6bbc598a63
ROCm Port (#1087)
* use hipblas based on cublas
* Update Makefile for the Cuda kernels
* Expand arch list and make it overrideable
* Fix multi GPU on multiple amd architectures with rocblas_initialize() (#5)
* add hipBLAS to README
* new build arg LLAMA_CUDA_MMQ_Y
* fix half2 decomposition
* Add intrinsics polyfills for AMD
* AMD assembly optimized __dp4a
* Allow overriding CC_TURING
* use "ROCm" instead of "CUDA"
* ignore all build dirs
* Add Dockerfiles
* fix llama-bench
* fix -nommq help for non CUDA/HIP

---------

Co-authored-by: YellowRoseCx <80486540+YellowRoseCx@users.noreply.github.com>
Co-authored-by: ardfork <134447697+ardfork@users.noreply.github.com>
Co-authored-by: funnbot <22226942+funnbot@users.noreply.github.com>
Co-authored-by: Engininja2 <139037756+Engininja2@users.noreply.github.com>
Co-authored-by: Kerfuffle <44031344+KerfuffleV2@users.noreply.github.com>
Co-authored-by: jammm <2500920+jammm@users.noreply.github.com>
Co-authored-by: jdecourval <7315817+jdecourval@users.noreply.github.com>
2023-08-25 12:09:42 +03:00