llama.cpp

History

Georgi Gerganov fd1234cb46 llama : add gpt-oss (#15091 ) * oai moe * compat with new checkpoint * add attn sink impl * add rope scaling yarn * logits match with latest transformers code * wip chat template * rm trailing space * use ggml_scale_bias * rm redundant is_swa_all * convert interleaved gate_up * graph : fix activation function to match reference (#7) * vocab : handle o200k_harmony special tokens * ggml : add attention sinks support (#1) * llama : add attn sinks * ggml : add attn sinks * cuda : add attn sinks * vulkan : add support for sinks in softmax remove unnecessary return * ggml : add fused swiglu_oai op (#11) * ggml : add fused swiglu_oai op * Update ggml/src/ggml-cpu/ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * update CUDA impl * cont : metal impl * add vulkan impl * test-backend-ops : more test cases, clean up * llama : remove unfused impl * remove extra lines --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: slaren <slarengh@gmail.com> * repack mxfp4 upon conversion * clean up a bit * enable thinking * add quick hack to render only some special tokens * fix bf16 conversion * remove vocab hack * webui ok * support chat parsing for gpt-oss * fix webui * direct mapping mxfp4, FINALLY * force using mxfp4 * properly use lazy tensor * ggml : add mxfp4 ggml : use e8m0 conversion instead of powf Co-authored-by: Diego Devesa <slarengh@gmail.com> change kvalues_mxfp4 table to match e2m1 (#6) metal : remove quantization for now (not used) cuda : fix disabled CUDA graphs due to ffn moe bias vulkan : add support for mxfp4 cont : add cm2 dequant * ggml : add ggml_add_id (#13) * ggml : add ggml_add_id * add cuda impl * llama : add weight support check for add_id * perf opt * add vulkan impl * rename cuda files * add metal impl * allow in-place ggml_add_id * llama : keep biases on CPU with --cpu-moe * llama : fix compile error ggml-ci * cuda : add fallback for __nv_cvt_e8m0_to_bf16raw ggml-ci * cleanup ggml-ci * sycl : fix supports_op for MXFP4 ggml-ci * fix Unknown reasoning format * ggml-cpu : fix AVX build ggml-ci * fix hip build ggml-ci * cuda : add mxfp4 dequantization support for cuBLAS ggml-ci * ggml-cpu : fix mxfp4 fallback definitions for some architectures ggml-ci * cuda : fix version required for __nv_cvt_e8m0_to_bf16raw --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> Co-authored-by: slaren <slarengh@gmail.com>		2025-08-05 22:10:36 +03:00
..
ggml-blas	cmake : Fix broken CMake error messages (ggml/1252)	2025-06-01 13:43:57 +03:00
ggml-cann	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-cpu	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-cuda	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-hip	HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (#14930 )	2025-07-29 17:44:30 +02:00
ggml-metal	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-musa	musa: upgrade musa sdk to rc4.2.0 (#14498 )	2025-07-24 20:05:37 +01:00
ggml-opencl	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-rpc	rpc : check for null buffers in get/set/copy tensor endpoints (#14868 )	2025-07-25 12:17:02 +02:00
ggml-sycl	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-vulkan	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-webgpu	ggml: WebGPU backend host improvements and style fixing (#14978 )	2025-08-04 08:52:43 -07:00
CMakeLists.txt	cmake: Add GGML_BACKEND_DIR option (#15074 )	2025-08-04 21:29:14 +02:00
ggml-alloc.c	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-backend-impl.h	ggml : upgrade init_tensor API to return a ggml_status (#11854 )	2025-02-28 14:41:47 +01:00
ggml-backend-reg.cpp	cmake: Add GGML_BACKEND_DIR option (#15074 )	2025-08-04 21:29:14 +02:00
ggml-backend.cpp	sched : fix multiple evaluations of the same graph with pipeline parallelism (#14855 )	2025-07-25 11:07:26 +03:00
ggml-common.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-impl.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-opt.cpp	mnist: fix segmentation fault (ggml/1227)	2025-05-19 13:29:56 +03:00
ggml-quants.c	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-quants.h	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml-threading.cpp	ggml : build backends as libraries (#10256 )	2024-11-14 18:04:35 +01:00
ggml-threading.h	remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS (#10797 )	2024-12-12 19:02:49 +01:00
ggml.c	llama : add gpt-oss (#15091 )	2025-08-05 22:10:36 +03:00
ggml.cpp	ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)	2025-06-01 13:43:57 +03:00
gguf.cpp	ggml : prevent integer overflow in gguf tensor size calculation (#14595 )	2025-07-09 14:33:53 +02:00