llama.cpp

History

ruixiangw 8fac4b1cc8 feat: add EAGLE3 speculative decoding support EAGLE3 is an encoder-decoder based speculative decoding method: - Extracts features from target model at specific layers - Uses feature fusion layer to compress target features - Generates draft tokens with single-layer decoder - Maps draft vocabulary to target vocabulary via d2t tensor Key changes: - Add LLM_ARCH_EAGLE3 architecture - Add EAGLE3 encoder/decoder graph (src/models/eagle3.cpp) - Add feature extraction from target model layers - Add g_embeddings handling for decoder input - Add GGML_TENSOR_FLAG_SYNC for GPU synchronization - Add --eagle3 flag for speculative-simple example - Add EAGLE3 model conversion in convert_hf_to_gguf.py	2025-12-14 18:12:33 +00:00
..
llama-cpp.h	llama : add `llama_vocab`, functions -> methods, naming (#11110 )	2025-01-12 11:32:42 +02:00
llama.h	feat: add EAGLE3 speculative decoding support	2025-12-14 18:12:33 +00:00

ruixiangw 8fac4b1cc8 feat: add EAGLE3 speculative decoding support

EAGLE3 is an encoder-decoder based speculative decoding method:
- Extracts features from target model at specific layers
- Uses feature fusion layer to compress target features
- Generates draft tokens with single-layer decoder
- Maps draft vocabulary to target vocabulary via d2t tensor

Key changes:
- Add LLM_ARCH_EAGLE3 architecture
- Add EAGLE3 encoder/decoder graph (src/models/eagle3.cpp)
- Add feature extraction from target model layers
- Add g_embeddings handling for decoder input
- Add GGML_TENSOR_FLAG_SYNC for GPU synchronization
- Add --eagle3 flag for speculative-simple example
- Add EAGLE3 model conversion in convert_hf_to_gguf.py

2025-12-14 18:12:33 +00:00

llama-cpp.h

llama : add `llama_vocab`, functions -> methods, naming (#11110 )

2025-01-12 11:32:42 +02:00

llama.h

feat: add EAGLE3 speculative decoding support

2025-12-14 18:12:33 +00:00