HiDream O1 Image nodes for ComfyUI — local HiDream O1 generation with text prompts, optional reference images, BF16/FP16/FP32/FP8 model loading, FlashAttention, SageAttention, preview updates, and ComfyUI DynamicVRAM/Aimdo integration.
- HiDream O1 Image generation directly inside ComfyUI
- Text-only and reference-image workflows
- Dynamic
image_1toimage_12inputs on the sampler node - Optional Dev layout conditioning via JSON bbox input
keep_image1_aspecttoggle for reference-driven output aspect ratio- BF16, FP16, FP32, FP8 E4M3FN, and FP8 E5M2 loader options
- FP8 mixed-weight loading using ComfyUI manual-cast style compute
- FlashAttention, SageAttention, and PyTorch SDPA attention backends
- Progress previews through ComfyUI's sampler progress bar
- Dev/Dev-2604 patch-grid smoothing node for reducing visible tile seams
- AI Toolkit-aligned HiDream O1 LoRA training nodes
- ComfyUI model management, unload, DynamicVRAM, and Aimdo/VBAR support
Search for HiDream O1 or HiDream_O1-ComfyUI in ComfyUI Manager and install it.
cd ComfyUI/custom_nodes
git clone https://github.com/Saganaki22/HiDream_O1-ComfyUI.git
cd HiDream_O1-ComfyUI
python -m pip install -r requirements.txtRestart ComfyUI after installing or updating.
Suggested transformers version: 4.57.1 – 5.3 (newer versions may break compatibility).
HiDream's May 13, 2026 upstream update notes that PyTorch 2.9.x is not recommended because of a Qwen3-VL issue. This node logs a warning when it detects 2.9.x.
Download the complete model folder from one of the links below and place it inside ComfyUI/models/diffusion_models/:
| Precision | VRAM | Download |
|---|---|---|
| Full BF16 | ~18–20 GB | drbaph/HiDream-O1-Image-BF16 |
| Full FP16 | ~18–20 GB | drbaph/HiDream-O1-Image-FP16 |
| Full FP8 | ~10–11 GB | drbaph/HiDream-O1-Image-FP8 |
| Dev 2604 BF16 | ~18–20 GB | drbaph/HiDream-O1-Image-Dev-2604-BF16 |
| Dev 2604 FP16 | ~18–20 GB | drbaph/HiDream-O1-Image-Dev-2604-FP16 |
| Dev 2604 FP8 | ~10–11 GB | drbaph/HiDream-O1-Image-Dev-2604-FP8 |
| Dev BF16 | ~18–20 GB | drbaph/HiDream-O1-Image-Dev-BF16 |
| Dev FP16 | ~18–20 GB | drbaph/HiDream-O1-Image-Dev-FP16 |
| Dev FP8 | ~10–11 GB | drbaph/HiDream-O1-Image-Dev-FP8 |
Example — FP8 (lowest VRAM):
- Go to drbaph/HiDream-O1-Image-FP8
- Download the entire model folder (all files, not just the safetensors)
- Place it at
ComfyUI/models/diffusion_models/HiDream-O1-Image-fp8/
The folder must contain the full Hugging Face support files:
config.json
chat_template.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
model.safetensors
The original sharded format also works if the folder contains model.safetensors.index.json and all shard files.
The model loader always shows the built-in converted model choices: Full/Dev BF16, FP16, FP8, plus Dev-2604 BF16, FP16, and FP8. If the selected model already exists locally, it is used. If it is missing, enable download_if_missing and the selected model will be downloaded into ComfyUI/models/diffusion_models.
Local folder matching is case-insensitive, so HiDream-O1-Image-Dev-FP8, hidream-o1-image-dev-fp8, and the default target folder casing all resolve to the same built-in choice. The loader dropdown only shows the built-in HiDream O1 model choices.
The original/full HiDream O1 model can show grid artifacts or other reference-image artifacts. In the upstream issue tracker, a HiDream developer recommends trying the Dev model because it should have fewer grid artifacts, and notes that reference-image generation is still being improved: HiDream-ai/HiDream-O1-Image issue #1.
In general, the Full model is the better choice for realism and photographic detail. The Dev model is faster and often better for illustration, digital design, and cleaner grid/artifact behavior, but it can be more sensitive to scheduler and resolution choices.
| Variant | Precision | Hugging Face repo | Target folder |
|---|---|---|---|
| Full | auto, bf16, fp32 |
drbaph/HiDream-O1-Image-BF16 |
HiDream-O1-Image-bf16 |
| Full | fp16 |
drbaph/HiDream-O1-Image-FP16 |
HiDream-O1-Image-fp16 |
| Full | fp8_e4m3fn, fp8_e5m2 |
drbaph/HiDream-O1-Image-FP8 |
HiDream-O1-Image-fp8 |
| Dev 2604 | auto, bf16, fp32 |
drbaph/HiDream-O1-Image-Dev-2604-BF16 |
HiDream-O1-Image-Dev-2604-bf16 |
| Dev 2604 | fp16 |
drbaph/HiDream-O1-Image-Dev-2604-FP16 |
HiDream-O1-Image-Dev-2604-fp16 |
| Dev 2604 | fp8_e4m3fn, fp8_e5m2 |
drbaph/HiDream-O1-Image-Dev-2604-FP8 |
HiDream-O1-Image-Dev-2604-fp8 |
| Dev | auto, bf16, fp32 |
drbaph/HiDream-O1-Image-Dev-BF16 |
HiDream-O1-Image-Dev-bf16 |
| Dev | fp16 |
drbaph/HiDream-O1-Image-Dev-FP16 |
HiDream-O1-Image-Dev-fp16 |
| Dev | fp8_e4m3fn, fp8_e5m2 |
drbaph/HiDream-O1-Image-Dev-FP8 |
HiDream-O1-Image-Dev-fp8 |
Loads a local HiDream O1 model folder and returns a Comfy-managed model handle.
| Parameter | Default | Description |
|---|---|---|
model_name |
HiDream-O1-Image-BF16 |
Built-in HiDream O1 model choice |
precision |
auto |
Detects safetensors dtype, or forces bf16, fp16, fp32, fp8_e4m3fn, fp8_e5m2 |
attention |
auto |
auto, flash, sdpa, or sage |
download_if_missing |
false |
Downloads the selected built-in model if it is not installed locally |
Creates prompt conditioning for the sampler.
| Parameter | Default | Description |
|---|---|---|
prompt |
cinematic portrait prompt | Text instruction for generation |
enhanced_prompt |
optional input | Optional STRING input from ComfyUI's bundled Prompt Enhance subgraph or any prompt-enhancer output; when connected and non-empty, it replaces the prompt textbox |
negative_prompt |
empty | Negative prompt used as the unconditional CFG branch in full mode when guidance_scale is above 1.0; dev mode ignores CFG |
Optional bundled ComfyUI prompt-enhancement flow (generic, not HiDream-O1-specific):
Prompt Enhance -> HiDream O1 Conditioning enhanced_prompt
ComfyUI's bundled Prompt Enhance blueprint is a generic subgraph around the Google Gemini node, not part of the native HiDream-O1 model/conditioning path and not the local Gemma 4 Generate Text node. The generic Generate Text node can still be used if you provide your own instruction prompt, but it is not the same prompt-enhancement workflow.
Applies a LoRA between the model loader and sampler:
HiDream O1 Model Loader -> HiDream O1 LoRA -> HiDream O1 Sampler
The LoRA dropdown reads from ComfyUI/models/loras/, including supported LoRA files inside symlinked folders.
| Parameter | Default | Description |
|---|---|---|
lora_name |
None when no LoRAs are found |
LoRA file |
strength |
1.0 |
Model strength from -10.0 to 10.0; 0 disables the LoRA |
Applies patch-grid smoothing between the model loader or LoRA node and the sampler:
HiDream O1 Model Loader -> HiDream O1 Dev Smoothing -> HiDream O1 Sampler
HiDream O1 Model Loader -> HiDream O1 LoRA -> HiDream O1 Dev Smoothing -> HiDream O1 Sampler
This node is gated to Dev and Dev-2604 model folders. It runs extra shifted patch predictions during the last denoise steps and blends them back into the latent patch grid to reduce visible seams.
| Parameter | Default | Description |
|---|---|---|
steps |
4 |
Final denoise steps to smooth; 0 disables smoothing |
strength |
0.5 |
Blend strength for shifted patch prediction |
schedule |
constant |
Strength schedule over smoothing steps |
shift_mode |
rotate |
Patch-grid shift pattern |
adaptive_threshold |
0.0 |
Skip smoothing when estimated seam intensity is below this value; 0 disables skipping |
multiscale |
false |
Adds a smaller patch-grid offset |
cfg_aware |
false |
Also smooths the unconditional branch when CFG is active; costs extra forwards |
Experimental text-to-image LoRA training is available directly inside ComfyUI:
HiDream O1 Dataset Maker -> HiDream O1 Train Config -> HiDream O1 LoRA Trainer
The trainer is for image/caption datasets only. Reference-image, edit, and subject-personalization training are not wired yet.
Dataset folder layout:
my_dataset/
image_001.png
image_001.txt
image_002.jpg
image_002.txt
Each .txt file should contain the caption for the image with the same basename. The Dataset Maker writes a train.jsonl manifest that the trainer consumes.
Training notes:
| Parameter | Default | Description |
|---|---|---|
base_model_name |
HiDream-O1-Image-BF16 |
Full O1 BF16 weights |
resolution |
1024 |
Images are resized/cropped to a patch-aligned training size |
target_preset |
aitoolkit |
Trains linear-like layers except lm_head, patch_embed, and visual, matching AI Toolkit's O1 ignore list |
loss_target |
velocity |
Converts the model's x0 prediction into flow velocity before loss |
noise_scale |
8.0 |
Scales training noise the same way as AI Toolkit's HiDream O1 flow scheduler |
timestep_type |
linear |
AI Toolkit's O1 default |
max_loss |
1.0 |
Caps extreme loss spikes like AI Toolkit's O1 default |
lora_rank / lora_alpha |
32 / 32 |
AI Toolkit-style linear LoRA defaults |
weight_decay |
0.0001 |
AdamW weight decay default from AI Toolkit's job config |
save_dtype |
bf16 |
LoRA checkpoint tensor dtype |
max_steps |
3000 |
Total training steps |
save_every_steps |
250 |
Checkpoint interval |
Outputs are saved under ComfyUI/models/loras/<output_name>/ as .safetensors files plus hidream_o1_lora_config.json. After training, select the saved .safetensors in the normal HiDream O1 LoRA node.
The trainer follows AI Toolkit's May 2026 HiDream O1 recipe: it adds scaled noise with noise_scale=8.0, feeds the noisy image patches through the Qwen-VL model, converts the x0 prediction into a velocity-equivalent prediction, and trains against noise * noise_scale - image. The trainer runs in-process and blocks the ComfyUI queue while it is active. Use the Full model for training; Dev is intentionally not exposed in the trainer because it is distilled and may train unpredictably.
For a deeper setup and tuning guide, see HiDream O1 training notes.
Runs the model and outputs a ComfyUI IMAGE.
| Parameter | Default | Description |
|---|---|---|
model_type |
auto |
Uses dev settings if the model folder name contains dev, otherwise full settings |
width |
2048 |
Requested output width; internally snapped to a supported patch-aligned resolution |
height |
2048 |
Requested output height; internally snapped to a supported patch-aligned resolution |
steps |
0 |
0 means auto: 50 for full; dev always uses the upstream fixed 28-step schedule |
seed |
42 |
Random seed |
guidance_scale |
5.0 |
CFG scale for full mode; dev mode ignores CFG |
shift |
-1.0 |
-1 means auto: 3.0 for full, 1.0 for dev |
noise_scale_start |
7.5 |
Initial noise scale |
noise_scale_end |
7.5 |
Final noise scale |
noise_clip_std |
2.5 |
Noise clipping standard deviation |
dev_editing_scheduler |
flow_match |
Dev edit mode scheduler when exactly one reference image is connected; flash remains available |
layout_bboxes |
empty | Optional JSON string or JSON file path for layout conditioning with reference images |
preview_every |
4 |
Sends a decoded preview every N steps; 0 disables previews |
keep_image1_aspect |
false |
Only applies when image_1 is connected |
force_offload |
false |
Unloads the model immediately after generation |
image |
0 |
Dynamic reference image count, from 0 to 12 |
Reference image inputs are optional. Set image to 0 for text-only generation, or increase it to show image_1, image_2, and so on up to image_12.
auto detects the model storage dtype from the safetensors file. For native mixed FP8 folders, the large matrix weights should be float8_e4m3fn while small tensors such as norms and biases stay BF16/FP16.
Do not set config.json to float8_e4m3fn. Transformers may try to use FP8 as PyTorch's global default dtype, which fails. Keep config dtype as bfloat16; this node detects FP8 from the safetensors tensors themselves.
The loader exposes the normal FP8 options only.
The sampler automatically picks the scheduler based on model type:
| Model type | Scheduler | Notes |
|---|---|---|
Full (auto) |
FlowUniPCMultistepScheduler |
Higher-order solver, generates more detail |
| Dev text / subject | FlashFlowMatchEulerDiscreteScheduler |
Custom Euler with built-in noise scaling, tuned for fewer steps |
| Dev edit with one reference | FlowMatchEulerDiscreteScheduler by default |
Matches the May 13, 2026 upstream Dev editing scheduler update; flash is still selectable |
When model_type is auto, the folder name is checked for dev — if not found, the full model path is used with UniPC.
Dev follows the upstream recipe: fixed 28-step timetable, guidance 0.0, shift 1.0, and noise defaults 7.5 / 7.5 / 2.5 when using flash. If dev images look noisy, oddly colored, or washed out near the last few steps, reset noise_scale_start, noise_scale_end, and noise_clip_std to those defaults, use the flash or auto attention backend, and pin the output to one of the internal supported resolutions: 2048x2048, 2304x1728, 1728x2304, 2560x1440, 1440x2560, 2496x1664, 1664x2496, 3104x1312, 1312x3104, 2304x1792, or 1792x2304. Upstream recommends the Full model for editing tasks.
| Option | Description |
|---|---|
auto |
Uses FlashAttention when available, otherwise SDPA |
flash |
Requires FlashAttention [Optimal] |
sage |
Requires the sageattention package [Not Optimal] |
sdpa |
Uses PyTorch scaled dot-product attention |
- Demo: HiDream-O1-Image
- Dev 2604 BF16 model: drbaph/HiDream-O1-Image-Dev-2604-BF16
- Dev 2604 FP16 model: drbaph/HiDream-O1-Image-Dev-2604-FP16
- Dev 2604 FP8 model: drbaph/HiDream-O1-Image-Dev-2604-FP8
- BF16 model: drbaph/HiDream-O1-Image-BF16
- FP16 model: drbaph/HiDream-O1-Image-FP16
- FP8 model: drbaph/HiDream-O1-Image-FP8
- Dev BF16 model: drbaph/HiDream-O1-Image-Dev-BF16
- Dev FP16 model: drbaph/HiDream-O1-Image-Dev-FP16
- Dev FP8 model: drbaph/HiDream-O1-Image-Dev-FP8
- Upstream project: HiDream-ai/HiDream-O1-Image
- Node repository: Saganaki22/HiDream_O1-ComfyUI
This custom node is released under the MIT License. The HiDream O1 model has its own license and usage terms; check the upstream Hugging Face model page before redistribution or commercial use.