llama.cpp/gguf-py
EliteGPT AI 7bab4a3065 model : add Qwen3-Omni multimodal architecture support
Adds support for Qwen3-Omni, Alibaba's multimodal LLM that handles
text and vision. This enables the main LLM architecture and vision
encoder support.

Main LLM changes:
- Add LLM_ARCH_QWEN3OMNI enum and architecture registration
- Add hparams loading for MoE-based architecture (48 layers, 128 experts)
- Reuse llm_build_qwen3moe graph builder
- Add IMROPE type for multimodal position encoding

Vision encoder changes (via mtmd):
- Add PROJECTOR_TYPE_QWEN3O with auto-conversion to QWEN3VL for vision
- Support different embedding dimensions (vision=8192, audio=2048)
- Add separate Q/K/V tensor support in qwen3vl graph builder

Tested with Qwen3-Omni-30B-Q8_0.gguf on distributed 5-GPU setup:
- 41-44 tokens/sec inference speed
- Text and vision inference working

Note: Audio encoder support is WIP and will follow in a separate PR.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-31 20:25:55 +10:00
..
examples Refactor gguf scripts to improve metadata handling (#11909) 2025-02-26 08:04:48 -05:00
gguf model : add Qwen3-Omni multimodal architecture support 2025-12-31 20:25:55 +10:00
tests gguf-py : add Numpy MXFP4 de/quantization support (#15111) 2025-08-08 17:48:26 -04:00
LICENSE gguf : make gguf pip-installable 2023-08-25 09:26:05 +03:00
README.md gguf-py : GGUF Editor GUI - Python + Qt6 (#12930) 2025-04-18 20:30:41 +02:00
pyproject.toml gguf-py : make sentencepiece optional (#14200) 2025-06-19 15:56:12 +02:00

README.md

gguf

This is a Python package for writing binary files in the GGUF (GGML Universal File) format.

See convert_hf_to_gguf.py as an example for its usage.

Installation

pip install gguf

Optionally, you can install gguf with the extra 'gui' to enable the visual GGUF editor.

pip install gguf[gui]

API Examples/Simple Tools

examples/writer.py — Generates example.gguf in the current directory to demonstrate generating a GGUF file. Note that this file cannot be used as a model.

examples/reader.py — Extracts and displays key-value pairs and tensor details from a GGUF file in a readable format.

gguf/scripts/gguf_dump.py — Dumps a GGUF file's metadata to the console.

gguf/scripts/gguf_set_metadata.py — Allows changing simple metadata values in a GGUF file by key.

gguf/scripts/gguf_convert_endian.py — Allows converting the endianness of GGUF files.

gguf/scripts/gguf_new_metadata.py — Copies a GGUF file with added/modified/removed metadata values.

gguf/scripts/gguf_editor_gui.py — Allows for viewing, editing, adding, or removing metadata values within a GGUF file as well as viewing its tensors with a Qt interface.

Development

Maintainers who participate in development of this package are advised to install it in editable mode:

cd /path/to/llama.cpp/gguf-py

pip install --editable .

Note: This may require to upgrade your Pip installation, with a message saying that editable installation currently requires setup.py. In this case, upgrade Pip to the latest:

pip install --upgrade pip

Automatic publishing with CI

There's a GitHub workflow to make a release automatically upon creation of tags in a specified format.

  1. Bump the version in pyproject.toml.
  2. Create a tag named gguf-vx.x.x where x.x.x is the semantic version number.
git tag -a gguf-v1.0.0 -m "Version 1.0 release"
  1. Push the tags.
git push origin --tags

Manual publishing

If you want to publish the package manually for any reason, you need to have twine and build installed:

pip install build twine

Then, follow these steps to release a new version:

  1. Bump the version in pyproject.toml.
  2. Build the package:
python -m build
  1. Upload the generated distribution archives:
python -m twine upload dist/*

Run Unit Tests

From root of this repository you can run this command to run all the unit tests

python -m unittest discover ./gguf-py -v

TODO

  • Include conversion scripts as command line entry points in this package.