Change is decoupled from https://github.com/ggml-org/llama.cpp/pull/18641.
[LFM2.5-Audio-1.5B](https://huggingface.co/LiquidAI/LFM2.5-Audio-1.5B)
needs streaming istft for generating output audio.
* add streaming ISTFT class (`mtmd_audio_streaming_istft`) with overlap-add for audio reconstruction
* replace global audio cache with per-instance cache, the model requires
two independent caches, for preprocessing (audio input) and for istft
(audio output).
* unified templated FFT/IFFT implementation supporting both forward and inverse transforms