llama.cpp

History

ruixiangw 8fac4b1cc8 feat: add EAGLE3 speculative decoding support EAGLE3 is an encoder-decoder based speculative decoding method: - Extracts features from target model at specific layers - Uses feature fusion layer to compress target features - Generates draft tokens with single-layer decoder - Maps draft vocabulary to target vocabulary via d2t tensor Key changes: - Add LLM_ARCH_EAGLE3 architecture - Add EAGLE3 encoder/decoder graph (src/models/eagle3.cpp) - Add feature extraction from target model layers - Add g_embeddings handling for decoder input - Add GGML_TENSOR_FLAG_SYNC for GPU synchronization - Add --eagle3 flag for speculative-simple example - Add EAGLE3 model conversion in convert_hf_to_gguf.py		2025-12-14 18:12:33 +00:00
..
cmake	ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (#15094 )	2025-08-07 13:45:41 +02:00
include	feat: add EAGLE3 speculative decoding support	2025-12-14 18:12:33 +00:00
src	feat: add EAGLE3 speculative decoding support	2025-12-14 18:12:33 +00:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build (ggml/1394)	2025-12-14 08:33:51 +02:00