ComfyUI is the most powerful node-based interface for Stable Diffusion, and it is not close. The visual workflow system lets you chain checkpoints, LoRAs, ControlNets, IP-Adapters, and custom nodes into generation pipelines that would take hundreds of lines of Python to build manually. If you are serious about AI image generation, you are probably already using it.
The problem is hardware. A basic SDXL workflow with a refiner needs 12-16GB of VRAM. Add ControlNet and an upscaler, and you are at 18-20GB. Flux Dev at full precision? 24GB minimum. AnimateDiff video generation? Your local GPU is going to struggle, crash, or take forever.
Running ComfyUI on a cloud GPU eliminates this bottleneck. You get a dedicated RTX 4090 with 24GB of VRAM for $0.40-$0.80/hr, or an A100 with 80GB for batch work and video -- billed by the minute, no upfront hardware cost. Deploy a GPU on io.net in under 2 minutes, install ComfyUI, and start generating. When your session is done, shut it down and stop paying.
This guide walks through the full setup: choosing the right GPU for your workflow, deploying on io.net, installing ComfyUI with custom nodes, and optimizing performance. You will have a working cloud ComfyUI setup in under 15 minutes.
GPU Requirements for ComfyUI Workflows
Different Stable Diffusion models and workflows have very different VRAM needs. Choosing the right GPU saves you money and avoids frustrating out-of-memory crashes mid-generation.
Stable Diffusion 1.5: 8GB+ VRAM
SD 1.5 is the lightest model to run in ComfyUI. At 512x512, a single generation uses under 4GB of VRAM. Even with ControlNet and LoRAs loaded, you rarely exceed 8GB.
An RTX 4090 on io.net is overkill for SD 1.5 -- but that is the point. Generation is near-instant: 2-3 seconds per image at 20 steps. Batch sizes of 8 fit comfortably in 24GB. If you are still using SD 1.5 for speed-critical workflows (architectural visualization, texture generation, rapid prototyping), a cloud RTX 4090 at $0.40-$0.80/hr makes it absurdly fast and cheap.
SDXL: 12GB+ Recommended
SDXL is where most ComfyUI users live today. The base model at 1024x1024 needs 8-10GB. Add the refiner for a two-pass workflow and you are at 14-16GB. Stack ControlNet depth maps and IP-Adapter for style transfer, and 18-20GB is common.
The RTX 4090 (24GB) is the ideal GPU for SDXL work. You get enough headroom to load the base model, refiner, one or two ControlNet models, and a handful of LoRAs simultaneously. Generation runs at 4-6 seconds per image at 1024x1024. On io.net, this costs $0.40-$0.80/hr.
For workflows that stack four or five models at once (base + refiner + ControlNet + IP-Adapter + upscaler), an A100 40GB or 80GB at $1.20-$2.00/hr gives you room to breathe.
SD3 and Flux: 24GB+ Recommended
SD3 Medium and Flux represent the current frontier of image quality. They also represent the current frontier of VRAM consumption.
Flux Dev at FP16 uses 20-24GB for a single 1024x1024 generation. The T5 text encoder alone needs 8-10GB in FP16 (or 4-5GB in FP8). Add ControlNet conditioning and you are pushing past 24GB.
Flux Schnell is more forgiving -- 4 steps instead of 20-50 -- but the peak VRAM draw is similar.
For single images: An RTX 4090 (24GB) handles Flux Dev with FP16 text encoders and careful VRAM management. Use the --force-fp16 flag.
For batch generation: An A100 80GB at $1.20-$2.00/hr on io.net lets you batch 4-8 Flux images at once without worrying about memory. This is the right choice if you are doing production runs of dozens or hundreds of Flux generations.
Video Generation (SVD, AnimateDiff, Wan): 24GB+ Minimum
Video generation is the most demanding ComfyUI workflow. AnimateDiff with motion LoRAs needs 20-24GB for a 16-frame animation. Stable Video Diffusion (SVD) at higher frame counts pushes past 24GB. Wan video models are similarly heavy.
For video work, an A100 80GB is the recommendation. On io.net, that is $1.20-$2.00/hr. The 80GB of VRAM means you can load the video model, conditioning models, and still have headroom for longer frame sequences. An RTX 4090 works for short AnimateDiff clips at lower resolution, but you will hit VRAM limits quickly at 24+ frames or 1024x1024.
GPU Selection Summary
| Workflow | Min VRAM | Recommended GPU | io.net Cost |
|---|---|---|---|
| SD 1.5 (512x512) | 8GB | RTX 4090 (24GB) | $0.40-$0.80/hr |
| SDXL (1024x1024) | 12GB | RTX 4090 (24GB) | $0.40-$0.80/hr |
| SDXL + Refiner + ControlNet | 16GB | RTX 4090 (24GB) | $0.40-$0.80/hr |
| Flux Schnell (single image) | 16GB | RTX 4090 (24GB) | $0.40-$0.80/hr |
| Flux Dev (batch generation) | 24GB+ | A100 80GB | $1.20-$2.00/hr |
| AnimateDiff / SVD video | 24GB+ | A100 80GB | $1.20-$2.00/hr |
| High-res upscaling (4K+) | 16GB+ | RTX 4090 or A100 | $0.40-$2.00/hr |

Step-by-Step: Run ComfyUI on io.net
Here is the full process from signup to your first generation. Total time: about 10-15 minutes.
Step 1: Sign Up for io.cloud
Go to cloud.io.net and create an account. You can sign up with an email address or connect a Web3 wallet. Add a payment method -- io.net bills per minute, so you only pay for the time your GPU is running.
io.net has 320,000+ GPUs available across 130+ countries, so GPU availability is almost never an issue. You will not be sitting in a queue waiting for capacity.
Step 2: Deploy an RTX 4090 Container
From the io.cloud dashboard:
- Click Deploy and choose Container as the deployment type.
- Select RTX 4090 as your GPU (or A100 80GB if you need more VRAM for Flux batch work or video generation).
- Choose the PyTorch template. This comes pre-configured with CUDA, cuDNN, and Python -- everything ComfyUI requires as a base.
- Set storage to at least 50GB. Model checkpoints are large: SDXL base is 6.5GB, Flux Dev is 23GB, and LoRAs add up quickly.
- Click Deploy.
Your container is ready in under 2 minutes. The dashboard shows connection details as soon as the instance is running.
Step 3: SSH Into the Instance
Copy the SSH connection command from your io.cloud dashboard and connect:
ssh -i ~/.ssh/your_key user@<instance-ip> -p <port>
If you prefer not to set up SSH keys, use the web-based terminal available directly in the io.cloud dashboard.
Step 4: Clone ComfyUI and Install Dependencies
# Install system dependencies
apt update && apt install -y git wget curl libgl1-mesa-glx libglib2.0-0
# Clone ComfyUI
cd /workspace
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Install Python requirements
pip install -r requirements.txt
# Install xformers for faster attention (recommended)
pip install xformers
Since the PyTorch template includes CUDA and torch, this takes about 2-3 minutes. Verify your GPU is detected:
python -c "import torch; print(torch.cuda.get_device_name(0)); print(f'VRAM: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB')"
Expected output for RTX 4090:
NVIDIA GeForce RTX 4090
VRAM: 24.0 GB
Step 5: Download Models (Checkpoints, LoRAs, VAE)
ComfyUI stores models in specific subdirectories under models/. Download what you need for your workflow.
SDXL (recommended starting point):
cd /workspace/ComfyUI
# SDXL Base 1.0 (6.5GB)
wget -P models/checkpoints/ \
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
# SDXL Refiner (optional, for two-pass workflows)
wget -P models/checkpoints/ \
https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/resolve/main/sd_xl_refiner_1.0.safetensors
# SDXL VAE (better color accuracy)
wget -P models/vae/ \
https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors
Flux Schnell (fast, 4-step generation):
# Flux Schnell checkpoint
wget -P models/checkpoints/ \
https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors
# Required text encoders
wget -P models/clip/ \
https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors
wget -P models/clip/ \
https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp16.safetensors
# Flux VAE
wget -P models/vae/ \
https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/ae.safetensors
Flux Dev (higher quality, requires Hugging Face token):
huggingface-cli download black-forest-labs/FLUX.1-dev \
flux1-dev.safetensors \
--local-dir models/checkpoints/ \
--token YOUR_HF_TOKEN
LoRAs and ControlNet models: Download .safetensors LoRA files into models/loras/ and ControlNet models into models/controlnet/. CivitAI model URLs work with wget -- just grab the direct download link.
Cloud download speeds on io.net instances are typically 200-500+ MB/s from Hugging Face, so even the 23GB Flux Dev checkpoint downloads in under 2 minutes.
Step 6: Start ComfyUI with --listen Flag
cd /workspace/ComfyUI
python main.py --listen 0.0.0.0 --port 8188
The --listen 0.0.0.0 flag is required for remote access. Without it, ComfyUI only binds to localhost and you cannot reach it from your browser.
You should see:
Starting server
To see the GUI go to: http://0.0.0.0:8188
Useful startup flags:
| Flag | What It Does |
|---|---|
--listen 0.0.0.0 | Accept connections from any IP (required) |
--port 8188 | Default port; change if needed |
--force-fp16 | Run all models in half precision -- faster, uses less VRAM |
--highvram | Keep models in VRAM between runs (faster generation) |
--lowvram | Offload to CPU when VRAM is tight (slower but prevents OOM) |
--preview-method auto | Show live previews as images generate |
Step 7: Access ComfyUI Via Browser
Option A: SSH Port Forwarding (recommended)
Open a new terminal on your local machine:
ssh -L 8188:localhost:8188 -i ~/.ssh/your_key user@<instance-ip> -p <port>
Then open http://localhost:8188 in your browser. This is the most secure approach -- ComfyUI is only accessible through your SSH tunnel.
Option B: Public URL
If your io.net container exposes port 8188 (configurable in the dashboard), access ComfyUI directly at http://<instance-ip>:8188. Faster to set up but less secure.
Once connected, you will see the full ComfyUI node editor. Load a workflow JSON, build nodes, or start with the default text-to-image workflow. Everything runs on your cloud GPU.
[IMAGE: Screenshot of ComfyUI node editor connected to a cloud RTX 4090, showing SDXL workflow with generation time overlay]
Installing Custom Nodes and Models
ComfyUI's power comes from its ecosystem of custom nodes. Here is how to set up the essentials on your cloud instance.
ComfyUI Manager: Install This First
ComfyUI Manager is the package manager for custom nodes. Once installed, you can search for, install, and update any node pack directly from the ComfyUI web interface.
cd /workspace/ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
Restart ComfyUI (Ctrl+C, then re-run the start command). You will see a Manager button in the top menu bar. From there, one-click install for any custom node.
Popular Workflows and Their Required Nodes
| Workflow | Required Node Packs | Install Via |
|---|---|---|
| ControlNet (depth, canny, pose) | ComfyUI-Controlnet-Aux | Manager |
| IP-Adapter (face/style transfer) | ComfyUI-IPAdapter-Plus | Manager |
| AnimateDiff (video generation) | ComfyUI-AnimateDiff-Evolved | Manager |
| Face fix / detailing | ComfyUI-Impact-Pack | Manager |
| Upscaling | Built into ComfyUI | N/A |
| Batch processing / A/B testing | Efficiency Nodes | Manager |
| 200+ utility nodes | WAS Node Suite | Manager |
All of these install with one click through ComfyUI Manager. No terminal commands required after the initial Manager setup.
Model Storage Best Practices
Cloud GPU time costs money. Do not waste it re-downloading models.
1. Use persistent storage. Attach a storage volume on io.net and mount it as your models directory. Models survive container restarts, so you only download once.
2. Create a setup script. Keep a shell script with all your model download commands. On a new instance, run it and walk away.
#!/bin/bash
# setup-models.sh
echo "Downloading SDXL base..."
wget -q -P models/checkpoints/ https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
echo "Downloading SDXL VAE..."
wget -q -P models/vae/ https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors
echo "Downloading ControlNet depth..."
wget -q -P models/controlnet/ https://huggingface.co/lllyasviel/sd_control_collection/resolve/main/diffusers_xl_depth_full.safetensors
echo "All models downloaded."
3. Keep LoRAs organized. Create subdirectories inside models/loras/ by category (e.g., styles/, characters/, lighting/) so you can find them in the ComfyUI dropdown.
Performance Tips for Cloud GPUs
You are paying by the minute. These optimizations help you generate faster and get more from every dollar.
FP16 vs FP32: Always Use Half Precision
Launch ComfyUI with --force-fp16 unless you have a specific reason for full precision. FP16 halves VRAM usage and speeds up generation by 20-40% on modern GPUs. The quality difference is imperceptible for image generation.
python main.py --listen 0.0.0.0 --port 8188 --force-fp16
For Flux models, use the t5xxl_fp16.safetensors text encoder instead of the FP32 version. This alone saves 8-10GB of VRAM.
Batch Size Optimization
More VRAM means larger batches. Instead of queuing one image at a time, set the batch size in the Empty Latent Image node to generate multiple images in parallel.
| Model | RTX 4090 (24GB, FP16) | A100 80GB |
|---|---|---|
| SD 1.5 (512x512) | Batch 8: ~6 sec | Batch 16+: ~8 sec |
| SDXL (1024x1024) | Batch 4: ~12 sec | Batch 8+: ~14 sec |
| Flux Dev (1024x1024) | Batch 1-2: ~10 sec | Batch 4+: ~18 sec |
A batch of 4 takes roughly 2.5x the time of a single image, not 4x, because the GPU parallelizes the computation. This means batching is almost always more cost-efficient than single-image generation.
Tiled VAE for High-Resolution Output
Generating or upscaling images above 2048x2048 will crash most GPUs during VAE decode. The solution: use the VAE Decode (Tiled) node instead of the standard VAE Decode.
- Set tile size to 512 on RTX 4090
- Set tile size to 1024 on A100/L40S
Tiled VAE processes the image in patches, trading a small amount of speed for dramatically lower VRAM usage. This lets you generate 4K+ images on a 24GB card that would otherwise crash.
Using Multiple GPUs
io.net supports multi-GPU containers and deploying multiple instances. For ComfyUI, this works best as parallel instances rather than a single multi-GPU process:
- Deploy 2-4 RTX 4090 instances ($0.40-$0.80/hr each)
- Run a separate ComfyUI server on each
- Use ComfyUI's built-in API to distribute prompts across instances programmatically
This is ideal for production batch generation: 100+ product shots, style variations, or A/B testing across LoRA combinations. Four RTX 4090 instances on io.net cost $1.60-$3.20/hr total -- still cheaper than a single A100 on AWS.
Cost Comparison: ComfyUI on io.net vs RunPod vs Local RTX 4090
Here is what it actually costs to run ComfyUI across different platforms, assuming 4 hours of active generation per day (a typical creative or production workload).
| Setup | Hourly Cost | Monthly (4h/day) | VRAM | Notes |
|---|---|---|---|---|
| io.net RTX 4090 | $0.40-$0.80 | $48-$96 | 24GB | Per-minute billing, deploy in < 2 min |
| io.net A100 80GB | $1.20-$2.00 | $144-$240 | 80GB | For Flux batches, video, multi-model |
| RunPod Community | $0.44 | $53 | 24GB | Spot-like availability, variable quality |
| RunPod Secure | ~$0.70 | $84 | 24GB | Data center SLA, more reliable |
| Vast.ai | $0.25-$0.40 | $30-$48 | 24GB | Marketplace pricing, variable hosts |
| AWS (g5.xlarge) | $1.01 | $121 | 24GB (A10G) | No RTX 4090 available; A10G is slower |
| Local RTX 4090 | ~$0.15* | $1,600 upfront | 24GB | *Electricity only; GPU costs $1,600 |
Key takeaways:
- io.net is 70% cheaper than AWS for equivalent or better GPU hardware. AWS does not even offer RTX 4090 instances -- their cheapest 24GB option (A10G at $1.01/hr) is slower and costs more.
- io.net vs RunPod: Comparable at the RTX 4090 tier. io.net offers a clear advantage on A100 80GB pricing ($1.20-$2.00 vs $1.39-$2.20 on RunPod). io.net also offers per-minute billing with no minimums.
- Local vs cloud break-even: If you use ComfyUI less than 6-8 hours daily, cloud is cheaper than buying a $1,600 RTX 4090. Above that, local hardware pays for itself in 3-5 months (but you are locked to one GPU, one location, and you eat the depreciation).
- Deploy speed matters: io.net clusters deploy in under 2 minutes. You can spin up a GPU for a 30-minute generation session and shut it down. No paying for idle time.
Frequently Asked Questions
Can I save my ComfyUI workspace between sessions?
Yes. Use persistent storage on io.net to keep your models/, custom_nodes/, output/, and workflow JSON files across container restarts. You can also export workflows as JSON (they are small, portable files) and re-import them on any ComfyUI instance.
How fast is image generation on a cloud RTX 4090?
A cloud RTX 4090 performs identically to a local one -- same chip, same VRAM, same CUDA cores. Expect SDXL 1024x1024 at 20 steps in 4-6 seconds, SD 1.5 512x512 in 2-3 seconds, and Flux Schnell in under 2 seconds. The only added latency is displaying the result in your browser, which is typically under 1 second.
Do I need fast internet to use ComfyUI remotely?
Not for generation -- all compute happens on the cloud GPU. You need enough bandwidth to load the web interface (a few MB) and download generated images (1-5MB each). A stable 10 Mbps connection works fine. Low latency matters more than bandwidth for making the node editor feel responsive.
Can I install ComfyUI Manager and custom nodes on a cloud instance?
Absolutely. ComfyUI Manager works identically on cloud and local instances. Clone it into custom_nodes/, restart ComfyUI, and manage all your nodes from the browser. Use persistent storage so custom nodes survive container restarts.
What do I do if I run out of VRAM mid-workflow?
You will see a CUDA out of memory error. Five fixes, in order of preference: (1) add --force-fp16 to halve memory usage, (2) reduce batch size to 1, (3) use Tiled VAE Decode for high-res images, (4) drop unused models from the workflow, or (5) upgrade to a higher-VRAM GPU on io.net (switch from RTX 4090 to A100 80GB). You can also try the --lowvram flag, which offloads to CPU at the cost of speed.
Is there lag when using ComfyUI through SSH port forwarding?
The ComfyUI web interface is lightweight JavaScript and HTML. Node dragging, parameter changes, and workflow editing happen locally in your browser -- they do not require round-trips to the server. The only network-dependent actions are queuing a generation (tiny request) and receiving the result image. In practice, using ComfyUI over SSH port forwarding feels identical to running it locally.
Conclusion
ComfyUI on a cloud GPU gives you the power to run any workflow -- SDXL, Flux, ControlNet, AnimateDiff, multi-model pipelines -- without buying or maintaining hardware. An RTX 4090 on io.net costs $0.40-$0.80/hr, deploys in under 2 minutes, and handles everything most users throw at it. For video generation and heavy batch work, an A100 80GB at $1.20-$2.00/hr gives you VRAM headroom that no consumer card can match.
The setup takes about 15 minutes: deploy a container, clone the repo, download your models, install ComfyUI Manager, and connect through your browser. From there, it is the same ComfyUI you know -- just faster, with more VRAM, and without tying up your local machine.
Whether you are a professional artist running production batches, an AI hobbyist experimenting with Flux, or a developer building ComfyUI pipelines for clients -- cloud GPUs let you scale your image generation without scaling your hardware budget.
Deploy a ComfyUI GPU on io.net -- generation-ready in under 15 minutes.