TutorialNew Model

GPT Image 2 in ComfyUI: Reasoning-Driven Generation, Real Cost Data, and Why Every Output Needs Provenance

Jesse Blum, CTO, Numonic11 min
Numonic Volume 01 magazine cover — generated with GPT Image 2 in Comfy Cloud at 2K, ~40 credits, ~2 minutes; masthead, cover lines, and barcode rendered legibly in a single reasoning pass

OpenAI shipped GPT Image 2 on April 21, 2026. I tried it this morning in ChatGPT and was disappointed with the lack of metadata. Then this evening, Comfy Cloud announced it as a partner node! So I immediately generated the Numonic Volume 01 cover above in a single pass (took 3 re-rolls until I was happy) — masthead, cover lines, barcode, and logo and font. That has never worked before in any image model so well and so quickly. The reason it works now is that GPT Image 2 runs a reasoning step before it touches a pixel: it plans the composition, checks its work, and iterates. This guide walks through what the model actually is, how to reach it from ComfyUI, real credit and timing data from Comfy Cloud, the three capabilities that will reshape production pipelines this month — and the metadata problem every studio using it needs to solve in the next 102 days.

First-day data notice: All credits, generation times, and quality observations in this article are from my own testing on Comfy Cloud on April 22, 2026 — immediately after launch. Numbers will change as OpenAI tunes pricing and Comfy Cloud load balances under broader traffic. Published here as a directional reference, not a benchmark.

What Is GPT Image 2?

GPT Image 2 is OpenAI’s first reasoning-powered image model, announced April 21, 2026. The architectural distinction isn’t incremental. Every prior image model—Stable Diffusion variants, FLUX, Midjourney, Nano Banana 2—treats image generation as a sampling problem: the model reaches for a distribution of plausible pixels and commits. GPT Image 2 reframes generation as a planning problem. Before the pixels, there is a thinking pass: the model plans the composition, checks its work against the prompt, and iterates.

That matters because the things image models have historically broken on—dense text, small UI elements, iconography, infographics, maps, slides, comic and manga panels—are all cases where a sampled guess is worse than a planned layout. You cannot sample your way into a seven-item bulleted list in 11pt Helvetica, centred. You have to plan it. The Volume 01 cover at the top of this article has a masthead wordmark, two cover lines, a volume identifier, and a barcode. Every character renders as written. The previous generation of models would have produced glyph soup at least once per element.

The three capabilities OpenAI is leading with: native 2K output, targeted image edits that preserve everything outside the edit zone, and up to eight consistent images from a single prompt. I’ll show each of those with real output from this morning’s testing. None of them are marketing. All three change what a production pipeline can assume.

Setting Up GPT Image 2 in ComfyUI

GPT Image 2 is available immediately as a ComfyUI Partner Node. There are two pathways depending on whether you run your own ComfyUI instance or use Comfy Cloud.

Pathway A: Comfy Cloud (recommended)

The lowest-friction option. Comfy Cloud has an official OpenAI partnership exposing GPT Image 2 as a first-class node with no installation, no API key management, and billing handled through your Comfy Cloud account. Open a new workflow, search the Node Library for OpenAI GPT Image 1.5 (the node is named after the family; select the gpt-image-2 model inside it), drop it on the graph, wire it to a Save Image node, and generate. That’s the entire setup.

Pathway B: Self-hosted ComfyUI

Update ComfyUI to v0.19.4 or later. The OpenAI Partner Node ships in the stock node library; no custom node installation is required. Supply your OpenAI API key via .env or the ComfyUI credentials manager. Select the gpt-image-2 model in the node’s model dropdown. Billing runs through your OpenAI account at OpenAI’s list API rates; Comfy Cloud’s credit pricing is separate.

A note on the node name. The Partner Node umbrella is labelled “OpenAI GPT Image 1.5” in the Node Library for backward compatibility with the earlier release. The gpt-image-2 model is selected inside that node. Don’t let the label throw you off.

Museum conservation label rendered by GPT Image 2 — 12 specification rows in Poppins Medium 11pt, legible QR code, barcode, and JetBrains Mono caption
Dense-text spec sheet at 2K — 12 specification rows rendered cleanly in a single pass. Historically the feature every image model has failed at.

The Three Capabilities That Change Pipelines

1. Dense text and layout — the reasoning payoff

The spec sheet above is the clearest test I could think of: twelve rows of small-body typography, a scannable QR code, a barcode, and a caption in a second typeface. In a single generation pass, every character resolves legibly. That’s the reasoning step doing work. The model plans the typographic grid before committing pixels to it.

For production, this is the unlock agencies have been waiting on: poster layouts, social carousel cards with real copy, UI mockups, packaging comps, infographic hero frames, slide templates. Work that previously required a handoff to a designer for typesetting can now start inside the model. “Start” is the operative word—you’ll still hand off to a designer for final polish. But the zero draft is usable.

2. Edit fidelity — pixel-stable targeted changes

Edit-based workflows have been a structural weak spot in every prior image model. Small changes ripple outward: faces warp, composition drifts, the edit zone spreads into pixels the user never asked to touch. GPT Image 2 keeps everything outside the edit zone stable while applying the requested change cleanly at up to 2K.

The noon/dusk diptych below is the same portrait, same pose, same wardrobe, same composition—only the light shifts. Colorising a black-and-white photo or aging a product shot works the same way: targeted intent, no collateral damage to faces, geometry, or fine detail.

Noon/Dusk diptych of the same curator-engineer portrait — only lighting and sky colour change between panels; facial geometry, pose, and composition are pixel-stable
Targeted edit at 2K — "time of day" changed cleanly. Face, pose, wardrobe, and composition are pixel-stable between panels.

For retouching, archival restoration, and iterative creative direction—where a client says “same image, warmer light”—this is a meaningful workflow change. The loop tightens from regenerate-and-hope to edit-and-ship.

3. Eight consistent images from one prompt

GPT Image 2 can return up to eight distinct images from a single prompt while preserving character and object continuity across the series. Storyboards, reference sheets, character turnarounds, and product variants that used to require careful seed-locking, prompt gymnastics, and multiple re-rolls now come out of one node.

Eight-panel grid of the same curator-engineer in eight different contexts — identical facial geometry, pose register, and wardrobe base layer across all frames
Eight images, one prompt. Character identity held across eight different contexts from a single GPT Image 2 generation.

You can feed the batch straight into a Save Image loop, chain it into a video workflow, or pipe it into an upscaler node. For storyboard work, the consistency hold across eight contexts is what makes the feature usable. Small drift accumulates across scenes; GPT Image 2 doesn’t drift.

Real Cost and Timing Data (Comfy Cloud, Day One)

The number every studio is going to ask me about first: what does this cost per image. Here’s what I observed on Comfy Cloud the morning of April 22, across five generations at 2K:

WorkloadCreditsTimeNotes
Magazine cover (hero, dense typography)~40~2 min2K, single pass, wordmark preserved
Spec sheet (dense text)~40~2 min12 text rows, QR code, barcode all legible
Edit diptych (noon → dusk)~40~2 minComposition pixel-stable between panels
8-image consistency grid (single prompt)~40~2 minOne generation, eight frames

Two things stand out. First, the per-generation cost is flat across workload types on Comfy Cloud’s current pricing —the reasoning step dominates, so the complexity of what you’re asking for doesn’t change the bill much. Second, the eight-image consistency case is a bargain: one generation, eight usable frames. For storyboard work, the effective per-frame cost drops to ~5 credits.

Compared to alternatives: Nano Banana 2 sits at ~14 credits for a comparable 2K image in ~40 seconds, with no reasoning pass. FLUX.2 [schnell] is faster still but can’t render dense typography. The right mental model: GPT Image 2 is the node you reach for when typography, targeted edits, or multi-frame consistency matter. It is not the node you reach for when you need 355 images per minute of mood exploration. That’s still a Nano Banana 2 or FLUX job.

Hybrid Pipelines: The Real Pattern

GPT Image 2 slots naturally into hybrid graphs. The pattern that’s going to emerge in the next few weeks: use it for the text-heavy hero frame, then hand off to local models for upscaling, stylisation, or video extension. That’s the point of Partner Nodes—the best model for each step, in one graph.

Four-node pipeline diagram: gpt-image-2 hero frame → SDXL stylise → Topaz 4K upscale → Numonic provenance capture
The hybrid pattern: GPT Image 2 for the text-heavy hero, local models for stylisation and upscaling, Numonic for provenance capture across the chain.

Three pipeline recipes worth trying immediately:

Typography hero → stylise → deliver. GPT Image 2 produces the text-heavy hero frame. Feed it into an SDXL img2img node for grain, texture, or painterly stylisation. Then a Topaz or similar upscaler for 4K print delivery. The reasoning pass gives you the typography you can’t get anywhere else; local models give you the stylistic fingerprint that differentiates your brand.

Storyboard → animate. GPT Image 2 generates eight consistent keyframes from a single prompt. Hand them to Kling 3.0 or a Seedance pipeline to interpolate motion. The character and style consistency that GPT Image 2 holds across the grid carries through into the animation.

Edit loop → client review. GPT Image 2 produces a base frame. Edit-node variations (warmer light, cooler tone, swap background element) are generated with the edit-fidelity feature. Client reviews with a single parent-child relationship intact across all variants. The “which version did we approve” problem becomes a lineage query, not a forensics exercise.

The Provenance Problem GPT Image 2 Just Made Louder

Every image in this article was generated with GPT Image 2 on April 22, 2026. Every one of them went directly into Numonic via Connected Folders. What was captured alongside each PNG: the full ComfyUI node graph, the prompt text, the model version, the credit cost, the generation timestamp, and the tenant workspace it belongs to. I didn’t do anything to make that happen. The capture is the default, which was not the case when prompting in ChatGPT this morning.

Here’s why that matters in the next 102 days.

GPT Image 2 has just made it economically rational to generate text-heavy marketing assets, client-facing UI comps, poster layouts, and brand collateral with AI. The volume is about to go up. Every one of those outputs is, from a regulatory perspective, a generated artefact that needs a disclosure chain. EU AI Act Article 50 obligations take effect August 2, 2026. Studios serving European clients will need to demonstrate which model, which prompt, which version, and which user produced each piece of content—retrievable, auditable, per asset.

ComfyUI embeds workflow metadata in PNG files. That’s necessary but not sufficient. Metadata locked inside individual files has no search, no organisation, no ability to answer “show me every asset generated with gpt-image-2 between 2026-04-22 and 2026-04-29 for client X, grouped by prompt family.” That’s the query an auditor will run. Folders of PNGs don’t answer it.

The capture needs to happen at the point of generation and land in a system where it’s indexed, searchable, and tenant-scoped from day one. Retrofitting provenance to 100,000 PNGs already scattered across three cloud drives in July 2026 is a project no studio should be planning.

Browse the Published Collection

Every GPT Image 2 output in this article, with complete ComfyUI workflows embedded in each PNG. Drag any image into ComfyUI to load the exact node graph that created it, including prompt, model version, and reference inputs.

Browse Collection
Free Guide

The Complete ComfyUI Asset Management Guide

How to organise outputs from GPT Image 2, Nano Banana 2, FLUX, SDXL, and every other model that lands in ComfyUI — without losing the workflow that produced each one.

Read the guide
Technical Brief

AI-First DAM Architecture

The architecture pattern behind Numonic — why Data Vault 2.0, tenant isolation, and metadata capture at the node level matter for generative workflows.

Read the architecture brief

Key Takeaways

  • GPT Image 2 is a reasoning-driven image model, not an incremental diffusion update—the planning pass before generation is what finally makes dense typography, targeted edits, and multi-frame consistency reliable.
  • First-day Comfy Cloud cost is ~40 credits and ~2 min per 2K generation—flat across workload types. The eight-image consistency case is effectively ~5 credits per frame for storyboard work.
  • The winning pattern is hybrid, not single-model—GPT Image 2 for the text-heavy hero, local models (SDXL, FLUX, Topaz, Kling) for stylisation, upscaling, and animation. Partner Nodes make this one graph.
  • Every output now needs provenance capture at generation time—EU AI Act Article 50 obligations take effect August 2, 2026. PNG-embedded metadata is necessary but not sufficient. You need it indexed, searchable, and tenant-scoped from day one.
  • Every image in this article is downloadable with full workflow metadata—browse the published collection and drag any image into ComfyUI to reproduce the exact graph.

Capture Every GPT Image 2 Generation With Full Provenance

Numonic captures every output from GPT Image 2, Nano Banana 2, FLUX, SDXL, and every other ComfyUI model with full workflow metadata — searchable, reproducible, tenant-scoped, and ready for EU AI Act disclosure before the August 2 deadline.