Stephen Cox
547765a93e
mtmd: add Gemma 4 audio conformer encoder support ( #21421 )
...
* mtmd: add Gemma 4 audio conformer encoder support
Add audio processing for Gemma 4 E2B/E4B via a USM-style Conformer.
Architecture:
- 12-layer Conformer: FFN → Self-Attention → Causal Conv1D → FFN → Norm
- Subsampling Conv Projection: 2x Conv2D(stride=2) with LayerNorm
- Full self-attention with sinusoidal RPE and sliding window mask (24)
- Logit softcapping at 50.0, ClippableLinear clamping
- Output: 1024 → 1536 → RMSNorm → multimodal embedder
Mel preprocessing (dedicated mtmd_audio_preprocessor_gemma4a):
- HTK mel scale, 128 bins, magnitude STFT, mel_floor=1e-3
- Standard periodic Hann window (320 samples), zero-padded to FFT size
- Semicausal left-padding (frame_length/2 samples)
- Frame count matched to PyTorch (unfold formula)
- No pre-emphasis, no Whisper-style normalization
- Mel cosine similarity vs PyTorch: 0.9998
Key fixes:
- Tensor loading dedup: prevent get_tensor() from creating duplicate
entries in ctx_data. Fixed with std::set guard.
- ClippableLinear clamp_info loading moved after per-layer tensors.
- Sliding window mask (24 positions) matching PyTorch context_size.
- Skip Whisper normalization for Gemma4 mel output.
Tested on E2B and E4B with CPU and Vulkan backends.
Transcribes: "Glad to see things are going well and business is starting
to pick up" (matching ground truth).
Ref: #21325
2026-04-12 14:15:26 +02:00
uvos
9ef7523ee9
cuda/hip: fix loop unrolling in ssm-conv ( #20369 )
2026-03-11 13:04:32 +08:00
Aman Gupta
1e38a7a6fa
CUDA: use shared mem for ssm_conv ( #20128 )
...
* CUDA: use shared mem for ssm_conv
* fuse silu + ssm_conv
* fuse unary + mul
* enable for fp16
* formatting
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
---------
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2026-03-06 23:09:59 +08:00
Xuan-Son Nguyen
8ea958d4d9
model : add ASR support for LFM2-Audio-1.5B (conformer) ( #18106 )
...
* ASR with LFM2-Audio-1.5B
* Set rope_theta
* Fix comment
* Remove rope_theta setting
* Address PR feedback
* rename functions to conformer
* remove some redundant ggml_cont
* fix missing tensor
* add prefix "a." for conv tensors
* remove redundant reshape
* clean up
* add test model
---------
Co-authored-by: Tarek Dakhran <tarek@liquid.ai>
2025-12-19 00:18:01 +01:00
Tarek Dakhran
f5e96b368f
model : support LiquidAI LFM2 hybrid family ( #14620 )
...
**Important**
LFM2 was [merged ](https://github.com/huggingface/transformers/pull/39340 )into transformers, but has not yet been released.
To convert into gguf, install transformers from source
```shell
pip install "transformers @ git+https://github.com/huggingface/transformers.git@main "
```
2025-07-11 20:27:01 +02:00
a3sh
193c3e03a6
fix MUSA compiler warning ( #12704 )
...
* fix MUSA compiler warning
* replace (void) with GGML_UNUSED
2025-04-03 09:32:55 +02:00
a3sh
250d7953e8
ggml : faster ssm scan ( #10558 )
...
* faster ssm_scan
* delete unused commnet
* clang format
* add space
* modify unnecessary calculations
* faster ssm conv implementatioin
* modify file name with dash
2025-03-31 18:05:13 +02:00