llama.cpp/tools
EliteGPT AI 7bab4a3065 model : add Qwen3-Omni multimodal architecture support
Adds support for Qwen3-Omni, Alibaba's multimodal LLM that handles
text and vision. This enables the main LLM architecture and vision
encoder support.

Main LLM changes:
- Add LLM_ARCH_QWEN3OMNI enum and architecture registration
- Add hparams loading for MoE-based architecture (48 layers, 128 experts)
- Reuse llm_build_qwen3moe graph builder
- Add IMROPE type for multimodal position encoding

Vision encoder changes (via mtmd):
- Add PROJECTOR_TYPE_QWEN3O with auto-conversion to QWEN3VL for vision
- Support different embedding dimensions (vision=8192, audio=2048)
- Add separate Q/K/V tensor support in qwen3vl graph builder

Tested with Qwen3-Omni-30B-Q8_0.gguf on distributed 5-GPU setup:
- 41-44 tokens/sec inference speed
- Text and vision inference working

Note: Audio encoder support is WIP and will follow in a separate PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 20:25:55 +10:00
..
batched-bench tool/ex/tests: consistently free ctx, then model (#18168) 2025-12-22 11:00:37 +01:00
cli gen-docs: automatically update markdown file (#18294) 2025-12-22 19:30:19 +01:00
completion common: fix return value check for setpriority (#18412) 2025-12-29 11:07:49 +02:00
cvector-generator common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
export-lora cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
fit-params llama_fit_params: return enum for fail vs. error (#18374) 2025-12-27 09:59:19 +01:00
gguf-split cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
imatrix common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
llama-bench common: fix return value check for setpriority (#18412) 2025-12-29 11:07:49 +02:00
mtmd model : add Qwen3-Omni multimodal architecture support 2025-12-31 20:25:55 +10:00
perplexity common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
quantize cli: new CLI experience (#17824) 2025-12-10 15:28:59 +01:00
rpc Install rpc-server when GGML_RPC is ON. (#17149) 2025-11-11 10:53:59 +00:00
run Manually link -lbsd to resolve flock symbol on AIX (#16610) 2025-10-23 19:37:31 +08:00
server server: fix files built redundantly (#18474) 2025-12-30 13:11:13 +01:00
tokenize cmake : Do not install tools on iOS targets (#15903) 2025-09-16 09:54:44 +07:00
tts common : refactor common_sampler + grammar logic changes (#17937) 2025-12-14 10:11:13 +02:00
CMakeLists.txt llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653) 2025-12-15 09:24:59 +01:00