Commit Graph

6 Commits

Author SHA1 Message Date
Hasham Vakani ⚡ 15547a8720 fix: Add NVIDIA Blackwell (RTX 50xx, sm_120) support
- Use bfloat16 dtype for UNet on Blackwell GPUs (compute major >= 12)
  which have native bf16 tensor core support
- Skip manual_cast for bfloat16 weights to avoid unnecessary casting
- Fix numpy TypeError with bfloat16 tensors in patch.py and
  ip_adapter.py by converting to float32 before .numpy() calls

Tested on RTX 5070 (sm_120, CUDA 12.8) with PyTorch nightly (cu128).
Generates images at ~3.2 it/s including Image Prompt (IP-Adapter) mode.

Fixes #3862, #4123, #4141
2026-03-04 05:43:44 +05:00
Maxim Saplin 4d34f31a72
feat: allow users to specify the number of threads when running on CPU (#1601)
* CPU_NUM_THREADS

* refactor: optimize code, type is already strict

---------

Co-authored-by: Manuel Schmid <manuel.schmid@odt.net>
2024-02-25 17:14:17 +01:00
lllyasviel 1cc40d24d7 backend 2024-01-27 05:12:34 -08:00
lllyasviel 8e62a72a63
(requested) support AMD 8GB GPUs via Windows DirectML
this update is requested by users
2023-12-30 06:30:59 -08:00
lllyasviel 0e1aa8d084
better caster (#1480)
related to mps/rocm/cpu casting for fp16 and etc on clip
2023-12-17 17:09:15 -08:00
lllyasviel e8d88d3e25 2.1.826 2023-12-12 11:38:05 -08:00