Skip to content

Saganaki22/HiDream_O1-ComfyUI

Repository files navigation

HiDream_O1-ComfyUI

HiDream O1 Image nodes for ComfyUI — local HiDream O1 generation with text prompts, optional reference images, BF16/FP16/FP32/FP8 model loading, FlashAttention, SageAttention, preview updates, and ComfyUI DynamicVRAM/Aimdo integration.

Demo GitHub

中文文档

image

Features

  • HiDream O1 Image generation directly inside ComfyUI
  • Text-only and reference-image workflows
  • Dynamic image_1 to image_12 inputs on the sampler node
  • Optional Dev layout conditioning via JSON bbox input
  • keep_image1_aspect toggle for reference-driven output aspect ratio
  • BF16, FP16, FP32, FP8 E4M3FN, and FP8 E5M2 loader options
  • FP8 mixed-weight loading using ComfyUI manual-cast style compute
  • FlashAttention, SageAttention, and PyTorch SDPA attention backends
  • Progress previews through ComfyUI's sampler progress bar
  • Dev/Dev-2604 patch-grid smoothing node for reducing visible tile seams
  • AI Toolkit-aligned HiDream O1 LoRA training nodes
  • ComfyUI model management, unload, DynamicVRAM, and Aimdo/VBAR support
image

Installation

Method 1: ComfyUI Manager

Search for HiDream O1 or HiDream_O1-ComfyUI in ComfyUI Manager and install it.

Method 2: Manual Install

cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/HiDream_O1-ComfyUI.git
cd HiDream_O1-ComfyUI
python -m pip install -r requirements.txt

Restart ComfyUI after installing or updating.

Suggested transformers version: 4.57.1 – 5.3 (newer versions may break compatibility).

HiDream's May 13, 2026 upstream update notes that PyTorch 2.9.x is not recommended because of a Qwen3-VL issue. This node logs a warning when it detects 2.9.x.

Model Setup

Download the complete model folder from one of the links below and place it inside ComfyUI/models/diffusion_models/:

Precision VRAM Download
Full BF16 ~18–20 GB drbaph/HiDream-O1-Image-BF16
Full FP16 ~18–20 GB drbaph/HiDream-O1-Image-FP16
Full FP8 ~10–11 GB drbaph/HiDream-O1-Image-FP8
Dev 2604 BF16 ~18–20 GB drbaph/HiDream-O1-Image-Dev-2604-BF16
Dev 2604 FP16 ~18–20 GB drbaph/HiDream-O1-Image-Dev-2604-FP16
Dev 2604 FP8 ~10–11 GB drbaph/HiDream-O1-Image-Dev-2604-FP8
Dev BF16 ~18–20 GB drbaph/HiDream-O1-Image-Dev-BF16
Dev FP16 ~18–20 GB drbaph/HiDream-O1-Image-Dev-FP16
Dev FP8 ~10–11 GB drbaph/HiDream-O1-Image-Dev-FP8

Example — FP8 (lowest VRAM):

  1. Go to drbaph/HiDream-O1-Image-FP8
  2. Download the entire model folder (all files, not just the safetensors)
  3. Place it at ComfyUI/models/diffusion_models/HiDream-O1-Image-fp8/

The folder must contain the full Hugging Face support files:

config.json
chat_template.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
model.safetensors

The original sharded format also works if the folder contains model.safetensors.index.json and all shard files.

The model loader always shows the built-in converted model choices: Full/Dev BF16, FP16, FP8, plus Dev-2604 BF16, FP16, and FP8. If the selected model already exists locally, it is used. If it is missing, enable download_if_missing and the selected model will be downloaded into ComfyUI/models/diffusion_models.

Local folder matching is case-insensitive, so HiDream-O1-Image-Dev-FP8, hidream-o1-image-dev-fp8, and the default target folder casing all resolve to the same built-in choice. The loader dropdown only shows the built-in HiDream O1 model choices.

Upstream Artifact Note

The original/full HiDream O1 model can show grid artifacts or other reference-image artifacts. In the upstream issue tracker, a HiDream developer recommends trying the Dev model because it should have fewer grid artifacts, and notes that reference-image generation is still being improved: HiDream-ai/HiDream-O1-Image issue #1.

In general, the Full model is the better choice for realism and photographic detail. The Dev model is faster and often better for illustration, digital design, and cleaner grid/artifact behavior, but it can be more sensitive to scheduler and resolution choices.

Variant Precision Hugging Face repo Target folder
Full auto, bf16, fp32 drbaph/HiDream-O1-Image-BF16 HiDream-O1-Image-bf16
Full fp16 drbaph/HiDream-O1-Image-FP16 HiDream-O1-Image-fp16
Full fp8_e4m3fn, fp8_e5m2 drbaph/HiDream-O1-Image-FP8 HiDream-O1-Image-fp8
Dev 2604 auto, bf16, fp32 drbaph/HiDream-O1-Image-Dev-2604-BF16 HiDream-O1-Image-Dev-2604-bf16
Dev 2604 fp16 drbaph/HiDream-O1-Image-Dev-2604-FP16 HiDream-O1-Image-Dev-2604-fp16
Dev 2604 fp8_e4m3fn, fp8_e5m2 drbaph/HiDream-O1-Image-Dev-2604-FP8 HiDream-O1-Image-Dev-2604-fp8
Dev auto, bf16, fp32 drbaph/HiDream-O1-Image-Dev-BF16 HiDream-O1-Image-Dev-bf16
Dev fp16 drbaph/HiDream-O1-Image-Dev-FP16 HiDream-O1-Image-Dev-fp16
Dev fp8_e4m3fn, fp8_e5m2 drbaph/HiDream-O1-Image-Dev-FP8 HiDream-O1-Image-Dev-fp8

Nodes

HiDream O1 Model Loader

Loads a local HiDream O1 model folder and returns a Comfy-managed model handle.

Parameter Default Description
model_name HiDream-O1-Image-BF16 Built-in HiDream O1 model choice
precision auto Detects safetensors dtype, or forces bf16, fp16, fp32, fp8_e4m3fn, fp8_e5m2
attention auto auto, flash, sdpa, or sage
download_if_missing false Downloads the selected built-in model if it is not installed locally

HiDream O1 Conditioning

Creates prompt conditioning for the sampler.

Parameter Default Description
prompt cinematic portrait prompt Text instruction for generation
enhanced_prompt optional input Optional STRING input from ComfyUI's bundled Prompt Enhance subgraph or any prompt-enhancer output; when connected and non-empty, it replaces the prompt textbox
negative_prompt empty Negative prompt used as the unconditional CFG branch in full mode when guidance_scale is above 1.0; dev mode ignores CFG

Optional bundled ComfyUI prompt-enhancement flow (generic, not HiDream-O1-specific):

Prompt Enhance -> HiDream O1 Conditioning enhanced_prompt

ComfyUI's bundled Prompt Enhance blueprint is a generic subgraph around the Google Gemini node, not part of the native HiDream-O1 model/conditioning path and not the local Gemma 4 Generate Text node. The generic Generate Text node can still be used if you provide your own instruction prompt, but it is not the same prompt-enhancement workflow.

HiDream O1 LoRA

Applies a LoRA between the model loader and sampler:

HiDream O1 Model Loader -> HiDream O1 LoRA -> HiDream O1 Sampler

The LoRA dropdown reads from ComfyUI/models/loras/, including supported LoRA files inside symlinked folders.

Parameter Default Description
lora_name None when no LoRAs are found LoRA file
strength 1.0 Model strength from -10.0 to 10.0; 0 disables the LoRA

HiDream O1 Dev Smoothing

Applies patch-grid smoothing between the model loader or LoRA node and the sampler:

HiDream O1 Model Loader -> HiDream O1 Dev Smoothing -> HiDream O1 Sampler
HiDream O1 Model Loader -> HiDream O1 LoRA -> HiDream O1 Dev Smoothing -> HiDream O1 Sampler

This node is gated to Dev and Dev-2604 model folders. It runs extra shifted patch predictions during the last denoise steps and blends them back into the latent patch grid to reduce visible seams.

Parameter Default Description
steps 4 Final denoise steps to smooth; 0 disables smoothing
strength 0.5 Blend strength for shifted patch prediction
schedule constant Strength schedule over smoothing steps
shift_mode rotate Patch-grid shift pattern
adaptive_threshold 0.0 Skip smoothing when estimated seam intensity is below this value; 0 disables skipping
multiscale false Adds a smaller patch-grid offset
cfg_aware false Also smooths the unconditional branch when CFG is active; costs extra forwards

HiDream O1 LoRA Training

Experimental text-to-image LoRA training is available directly inside ComfyUI:

HiDream O1 Dataset Maker -> HiDream O1 Train Config -> HiDream O1 LoRA Trainer

The trainer is for image/caption datasets only. Reference-image, edit, and subject-personalization training are not wired yet.

Dataset folder layout:

my_dataset/
  image_001.png
  image_001.txt
  image_002.jpg
  image_002.txt

Each .txt file should contain the caption for the image with the same basename. The Dataset Maker writes a train.jsonl manifest that the trainer consumes.

Training notes:

Parameter Default Description
base_model_name HiDream-O1-Image-BF16 Full O1 BF16 weights
resolution 1024 Images are resized/cropped to a patch-aligned training size
target_preset aitoolkit Trains linear-like layers except lm_head, patch_embed, and visual, matching AI Toolkit's O1 ignore list
loss_target velocity Converts the model's x0 prediction into flow velocity before loss
noise_scale 8.0 Scales training noise the same way as AI Toolkit's HiDream O1 flow scheduler
timestep_type linear AI Toolkit's O1 default
max_loss 1.0 Caps extreme loss spikes like AI Toolkit's O1 default
lora_rank / lora_alpha 32 / 32 AI Toolkit-style linear LoRA defaults
weight_decay 0.0001 AdamW weight decay default from AI Toolkit's job config
save_dtype bf16 LoRA checkpoint tensor dtype
max_steps 3000 Total training steps
save_every_steps 250 Checkpoint interval

Outputs are saved under ComfyUI/models/loras/<output_name>/ as .safetensors files plus hidream_o1_lora_config.json. After training, select the saved .safetensors in the normal HiDream O1 LoRA node.

The trainer follows AI Toolkit's May 2026 HiDream O1 recipe: it adds scaled noise with noise_scale=8.0, feeds the noisy image patches through the Qwen-VL model, converts the x0 prediction into a velocity-equivalent prediction, and trains against noise * noise_scale - image. The trainer runs in-process and blocks the ComfyUI queue while it is active. Use the Full model for training; Dev is intentionally not exposed in the trainer because it is distilled and may train unpredictably.

For a deeper setup and tuning guide, see HiDream O1 training notes.

HiDream O1 Sampler

Runs the model and outputs a ComfyUI IMAGE.

Parameter Default Description
model_type auto Uses dev settings if the model folder name contains dev, otherwise full settings
width 2048 Requested output width; internally snapped to a supported patch-aligned resolution
height 2048 Requested output height; internally snapped to a supported patch-aligned resolution
steps 0 0 means auto: 50 for full; dev always uses the upstream fixed 28-step schedule
seed 42 Random seed
guidance_scale 5.0 CFG scale for full mode; dev mode ignores CFG
shift -1.0 -1 means auto: 3.0 for full, 1.0 for dev
noise_scale_start 7.5 Initial noise scale
noise_scale_end 7.5 Final noise scale
noise_clip_std 2.5 Noise clipping standard deviation
dev_editing_scheduler flow_match Dev edit mode scheduler when exactly one reference image is connected; flash remains available
layout_bboxes empty Optional JSON string or JSON file path for layout conditioning with reference images
preview_every 4 Sends a decoded preview every N steps; 0 disables previews
keep_image1_aspect false Only applies when image_1 is connected
force_offload false Unloads the model immediately after generation
image 0 Dynamic reference image count, from 0 to 12

Reference image inputs are optional. Set image to 0 for text-only generation, or increase it to show image_1, image_2, and so on up to image_12.

Precision Notes

auto detects the model storage dtype from the safetensors file. For native mixed FP8 folders, the large matrix weights should be float8_e4m3fn while small tensors such as norms and biases stay BF16/FP16.

Do not set config.json to float8_e4m3fn. Transformers may try to use FP8 as PyTorch's global default dtype, which fails. Keep config dtype as bfloat16; this node detects FP8 from the safetensors tensors themselves.

The loader exposes the normal FP8 options only.

Scheduler

The sampler automatically picks the scheduler based on model type:

Model type Scheduler Notes
Full (auto) FlowUniPCMultistepScheduler Higher-order solver, generates more detail
Dev text / subject FlashFlowMatchEulerDiscreteScheduler Custom Euler with built-in noise scaling, tuned for fewer steps
Dev edit with one reference FlowMatchEulerDiscreteScheduler by default Matches the May 13, 2026 upstream Dev editing scheduler update; flash is still selectable

When model_type is auto, the folder name is checked for dev — if not found, the full model path is used with UniPC.

Dev follows the upstream recipe: fixed 28-step timetable, guidance 0.0, shift 1.0, and noise defaults 7.5 / 7.5 / 2.5 when using flash. If dev images look noisy, oddly colored, or washed out near the last few steps, reset noise_scale_start, noise_scale_end, and noise_clip_std to those defaults, use the flash or auto attention backend, and pin the output to one of the internal supported resolutions: 2048x2048, 2304x1728, 1728x2304, 2560x1440, 1440x2560, 2496x1664, 1664x2496, 3104x1312, 1312x3104, 2304x1792, or 1792x2304. Upstream recommends the Full model for editing tasks.

Attention Backends

Option Description
auto Uses FlashAttention when available, otherwise SDPA
flash Requires FlashAttention [Optimal]
sage Requires the sageattention package [Not Optimal]
sdpa Uses PyTorch scaled dot-product attention

Links

License

This custom node is released under the MIT License. The HiDream O1 model has its own license and usage terms; check the upstream Hugging Face model page before redistribution or commercial use.