On macOS, AVX-512 context save is lazy: XCR0 bits 5-7 are not set until the process first executes an AVX-512 instruction, even on fully capable hardware. os_saves_zmm() was reading XCR0 directly and returning false on Darwin, causing ggml to fall back to AVX2 on machines that fully support AVX-512. Fix: on __APPLE__ builds, os_saves_zmm() queries sysctlbyname with hw.optional.avx512f instead of reading XCR0 bits 5-7. This reflects true hardware capability regardless of the lazy-enable state. All other platforms continue to use the XCR0 path unchanged. os_saves_ymm() is unchanged on Darwin because macOS always eagerly enables YMM state save when OSXSAVE is set. os_saves_amx() is unchanged because Intel AMX hardware does not exist on macOS x86 machines and os_saves_zmm() already returns false on non-AVX512 Darwin systems. Approach confirmed by google/cpu_features impl_x86_macos.c which uses the same hw.optional.avx512f sysctl for Darwin AVX-512 detection. |
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||