llama.cpp

History

Francis Couture-Harpin 3bc7103d2e ggml : avoid multiply by D in GGML_OP_SSM_SCAN This makes the weight buft detection in src/llama.cpp simpler. * convert : transpose Mamba-2 A, D and reshape SSM_NORM This breaks existing conversions of Mamba-2 models to avoid some reshapes. Not sure if it's a good idea, but it makes the graph slightly cleaner. * llama : more appropriate SSM_SCAN and SSM_CONV buft support checks		2024-11-04 13:29:47 -05:00
..
ggml-alloc.h	ggml : fix typo in example usage ggml_gallocr_new (ggml/984)	2024-10-04 18:50:05 +03:00
ggml-amx.h	add amx kernel for gemm (#8998 )	2024-10-18 13:34:36 +08:00
ggml-backend.h	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00
ggml-blas.h	ggml : add backend registry / device interfaces to BLAS backend (#9752 )	2024-10-07 21:55:08 +02:00
ggml-cann.h	[CANN] Adapt to dynamically loadable backends mechanism (#9970 )	2024-10-22 16:16:01 +08:00
ggml-cuda.h	llama : refactor model loader with backend registry (#10026 )	2024-10-30 02:01:23 +01:00
ggml-kompute.h	kompute: add backend registry / device interfaces (#10045 )	2024-10-30 17:01:52 +01:00
ggml-metal.h	ggml : add metal backend registry / device (#9713 )	2024-10-07 18:27:51 +03:00
ggml-rpc.h	rpc : add backend registry / device interfaces (#9812 )	2024-10-10 20:14:55 +02:00
ggml-sycl.h	[SYCL] Add SYCL Backend registry, device and Event Interfaces (#9705 )	2024-10-18 06:46:16 +01:00
ggml-vulkan.h	vulkan : add backend registry / device interfaces (#9721 )	2024-10-17 02:46:58 +02:00
ggml.h	ggml : avoid multiply by D in GGML_OP_SSM_SCAN	2024-11-04 13:29:47 -05:00