Optimización CUDA RTX 3060 con PyTorch 12.1

This commit is contained in:
ThePipis 2025-08-07 17:24:49 -07:00
parent 59f183ab9b
commit 2a501db7c6
4 changed files with 129 additions and 45 deletions

55
README_CUDA.md Normal file
View File

@ -0,0 +1,55 @@
# Fooocus CUDA Branch (cuda-rtx3060)
This branch provides a **CUDAonly** build of Fooocus optimized for **RTX 30series GPUs**, particularly the RTX 3060. The primary goal is to simplify the hardware backend logic and force the application to run exclusively on NVIDIA CUDA for maximum stability and performance. The following changes have been implemented:
## Key Changes
- **Force CUDA execution:** All alternative backends such as DirectML, Intel XPU and Apple MPS have been disabled. The `get_torch_device()` function in `ldm_patched/modules/model_management.py` now returns `torch.device("cuda")` directly. If CUDA is not available, the application exits with a clear error message.
- **Low VRAM logic:** LowVRAM mode is only enabled automatically on GPUs with **4 GB or less** of VRAM. GPUs with more memory will run in normal mode by default. Use the `--always-normal-vram` commandline option to force normal mode regardless of VRAM size.
- **Informative banner:** When the application starts, it prints the total VRAM and RAM detected and displays a clear banner indicating that it is running with the CUDAonly configuration (e.g. “Running with RTX 3060 + CUDA”).
- **PyTorch nightly installation:** `entry_with_update.py` installs the latest **PyTorch nightly** builds (`torch` and `torchvision`) with CUDA 12.1 support using:
```bash
python -m pip install --upgrade --pre --no-cache-dir torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu121
```
This command is executed automatically when running `entry_with_update.py`. It ensures that you are using a CUDAcompatible PyTorch version that takes advantage of the latest improvements. If the installation fails, the script prints a warning and continues launching Fooocus.
- **Removal of alternative backend detections:** Code related to DirectML, Intel XPU (ipex) and MPS has been removed or disabled. All device selection logic now funnels through CUDA.
## Usage
To run Fooocus with this branch:
1. **Switch to the branch:**
```bash
git checkout cuda-rtx3060
```
2. **Run the application:**
On Windows with the standalone build, the generated `run.bat` will call `entry_with_update.py`. On other systems, you can execute `python entry_with_update.py` directly:
```bash
python entry_with_update.py
```
3. **Verify CUDA usage:** On launch, you should see a banner similar to:
```
Total VRAM 12288 MB, total RAM 16384 MB
[Fooocus] Running with RTX 3060 + CUDA (cuda-rtx3060 branch)
```
This indicates that the application detected your CUDA device and is configured accordingly.
## Notes
- This branch is **not** compatible with nonNVIDIA hardware or systems without CUDA. If you need CPU, MPS or other backend support, use the default `main` branch.
- The nightly PyTorch builds may change frequently; if you encounter issues, you can uninstall and reinstall specific versions by editing the pip install command in `entry_with_update.py`.
Happy rendering!

View File

@ -76,6 +76,16 @@ prepare_environment()
build_launcher()
args = ini_args()
# For this CUDA-optimized branch, force Fooocus to use the first CUDA device
# exclusively and ignore any additional GPUs. This helps avoid unexpected
# device selection on multiGPU systems.
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
# On systems with more than 4 GB of VRAM we disable lowVRAM optimizations to
# maximise performance. The detection logic is handled in
# ldm_patched.modules.model_management, so here we only set a flag for clarity.
os.environ['FOOOCUS_DISABLE_LOW_VRAM'] = '1'
if args.gpu_device_id is not None:
os.environ['CUDA_VISIBLE_DEVICES'] = str(args.gpu_device_id)
print("Set device to:", args.gpu_device_id)

View File

@ -5,6 +5,14 @@ import ldm_patched.modules.utils
import torch
import sys
# NOTE:
# This branch `cuda-rtx3060` is intended for systems with an NVIDIA GPU (e.g. RTX 3060)
# and forces the application to run exclusively on CUDA. Alternative backends such
# as DirectML, Intel XPU, or Apple MPS have been intentionally disabled. If a CUDA
# device is not present, the application will exit with an informative error.
# Additionally, lowVRAM mode is only enabled on GPUs with 4 GB or less VRAM.
# A clear banner is printed during initialization to indicate these constraints.
class VRAMState(Enum):
DISABLED = 0 #No vram present: no need to move models to vram
NO_VRAM = 1 #Very low vram: enable all the options to save vram
@ -14,6 +22,7 @@ class VRAMState(Enum):
SHARED = 5 #No dedicated vram: memory shared between CPU and GPU but models still need to be moved between both.
class CPUState(Enum):
# The only supported device state is GPU when running on CUDA.
GPU = 0
CPU = 1
MPS = 2
@ -23,71 +32,55 @@ vram_state = VRAMState.NORMAL_VRAM
set_vram_to = VRAMState.NORMAL_VRAM
cpu_state = CPUState.GPU
# Track total dedicated VRAM (in bytes). Will be computed later.
total_vram = 0
# LowVRAM support flag; will be disabled on GPUs with more than 4 GB.
lowvram_available = True
xpu_available = False
# Always run deterministic algorithms if requested.
if args.pytorch_deterministic:
print("Using deterministic algorithms for pytorch")
torch.use_deterministic_algorithms(True, warn_only=True)
# Force all alternative backends to be disabled. This branch does not support
# DirectML, Intel XPU or Apple MPS. If the user provides flags requesting
# these backends, they will be ignored.
directml_enabled = False
if args.directml is not None:
import torch_directml
directml_enabled = True
device_index = args.directml
if device_index < 0:
directml_device = torch_directml.device()
else:
directml_device = torch_directml.device(device_index)
print("Using directml with device:", torch_directml.device_name(device_index))
# torch_directml.disable_tiled_resources(True)
lowvram_available = False #TODO: need to find a way to get free memory in directml before this can be enabled by default.
try:
import intel_extension_for_pytorch as ipex
if torch.xpu.is_available():
xpu_available = True
except:
pass
try:
if torch.backends.mps.is_available():
cpu_state = CPUState.MPS
import torch.mps
except:
pass
xpu_available = False
if args.always_cpu:
# Honour explicit CPU request by using CPU threads. Note that this branch
# will still attempt to use CUDA unless --always-cpu is set.
if args.always_cpu > 0:
torch.set_num_threads(args.always_cpu)
print(f"Running on {torch.get_num_threads()} CPU threads")
print(f"Running on {torch.get_num_threads()} CPU threads (forced CPU)")
cpu_state = CPUState.CPU
def is_intel_xpu():
global cpu_state
global xpu_available
if cpu_state == CPUState.GPU:
if xpu_available:
return True
"""Legacy helper retained for compatibility. Always returns False in this branch."""
return False
def get_torch_device():
global directml_enabled
global cpu_state
if directml_enabled:
global directml_device
return directml_device
if cpu_state == CPUState.MPS:
return torch.device("mps")
"""
Return the torch.device that should be used for model execution.
This branch forces the use of CUDA. If CUDA is not available, print a
descriptive error and exit. The CPU fallback (and other backends) are
intentionally unsupported in order to ensure maximum performance on an
NVIDIA GPU such as the RTX 3060.
"""
# If the user explicitly requested CPU via --always-cpu, honour it.
if cpu_state == CPUState.CPU:
return torch.device("cpu")
else:
if is_intel_xpu():
return torch.device("xpu")
else:
return torch.device(torch.cuda.current_device())
# Check CUDA availability.
if not torch.cuda.is_available():
print(
"[Error] CUDA device not available. This branch requires an NVIDIA GPU with CUDA support."
)
sys.exit(1)
# Use the current CUDA device.
return torch.device("cuda")
def get_total_memory(dev=None, torch_total_too=False):
global directml_enabled
@ -121,9 +114,18 @@ def get_total_memory(dev=None, torch_total_too=False):
total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
total_ram = psutil.virtual_memory().total / (1024 * 1024)
print("Total VRAM {:0.0f} MB, total RAM {:0.0f} MB".format(total_vram, total_ram))
# Print an informative banner indicating the runtime configuration.
print("[Fooocus] Running with RTX 3060 + CUDA (cuda-rtx3060 branch)")
# Determine whether to enable low VRAM mode. Only enable it if the GPU has
# 4 GB or less of VRAM and the user did not override via --always-normal-vram or
# --always-cpu.
if not args.always_normal_vram and not args.always_cpu:
if lowvram_available and total_vram <= 4096:
print("Trying to enable lowvram mode because your GPU seems to have 4GB or less. If you don't want this use: --always-normal-vram")
print(
"Enabling low VRAM mode because your GPU has 4 GB or less. Use --always-normal-vram to disable this."
)
set_vram_to = VRAMState.LOW_VRAM
try:

17
run.bat Normal file
View File

@ -0,0 +1,17 @@
@echo off
REM run.bat for the cuda-rtx3060 branch
REM This script installs the PyTorch nightly build with CUDA 12.1 support
REM and then launches Fooocus. It assumes you are running from the root
REM directory of the project and have a Python interpreter available.
echo [Setup] Installing PyTorch nightly with CUDA 12.1...
python.exe -m pip install --upgrade --pre --no-cache-dir torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu121
if %ERRORLEVEL% neq 0 (
echo [Setup] Failed to install PyTorch nightly. Continuing with existing installation.
)
echo [Launch] Starting Fooocus with CUDA optimization...
python.exe Fooocus\entry_with_update.py %*
if %ERRORLEVEL% neq 0 (
echo [Launch] Fooocus exited with an error.
)