Technical Architecture

The Two JSON Blobs Inside Every ComfyUI PNG

Every ComfyUI output embeds two distinct JSON structures in its PNG metadata — the workflow graph and the prompt execution data. Understanding the difference between them is essential for lineage tracking, reproducibility, and search.

February 25, 202610 minNumonic Team
Abstract visualization: Purple wireframe molecular network in space

Open any ComfyUI-generated PNG in a metadata viewer and you will find two large JSON structures embedded in the file's text chunks. They look similar at first glance — both contain node identifiers, class types, and parameter values. But they serve fundamentally different purposes, and confusing them leads to broken reproducibility, incomplete search, and incorrect lineage tracking.

The two blobs are the workflow and the prompt. The workflow is the graph as the user designed it — the visual node layout with UI positions, widget values, connections, and groups. The prompt is the execution plan — the resolved set of inputs that the ComfyUI engine actually processed to produce the output. They overlap but diverge in ways that matter for any system trying to understand, index, or reproduce the generation.

The Forces at Work

  • Reproducibility requires the prompt: To re-generate an image with identical output, you need the exact inputs that the engine processed — resolved references, concrete values, no UI abstractions. The prompt blob contains this execution state. Loading only the workflow and re-queuing may produce different results if default values, random seeds, or node configurations have changed since the original generation.
  • Editability requires the workflow: To modify a generation — change the prompt, swap a model, adjust a parameter — you need the graph as the user designed it, with UI positions, groups, and human-readable widget values. The prompt blob lacks this information; it is a flat execution plan without spatial layout or user organization.
  • Search needs both: Some queries target the user's intent (what workflow pattern did they use?), while others target the execution reality (which exact model checkpoint produced this output?). The two metadata problem extends inside a single tool — ComfyUI itself produces two metadata formats that serve different retrieval needs.

The Problem

Most tools that claim to extract ComfyUI metadata read only one of the two blobs, or treat them interchangeably. This creates several failure modes:

Indexing the wrong blob: If a search system indexes the workflow but not the prompt, it captures the user's design intent but misses the actual execution parameters. Searching for “images generated with checkpoint X” fails if checkpoint X was resolved at execution time from a widget value that the workflow blob stores as a display name rather than a file path.

Reproducibility from the wrong source: Loading the workflow blob into ComfyUI restores the visual graph — but the widget values may not match what was executed if nodes had randomized seeds, dynamic defaults, or values that changed between save and execution. The prompt blob captures the state at execution time, not design time.

Size and parsing failures: Both blobs can be large — 50 to 200 kilobytes each for complex workflows with 40 or more nodes. Systems that parse PNG metadata with fixed buffers or that expect small, simple text chunks fail silently on ComfyUI outputs, extracting partial or corrupted JSON.

The Solution: Dual Extraction and Cross-Referencing

An AI-native DAM handles the two JSON blobs as distinct but related data structures:

Extraction

The PNG specification allows arbitrary text chunks with keyword-value pairs. ComfyUI uses two specific keywords: workflow for the graph structure and prompt for the execution data. A robust extractor reads both chunks, validates each as well-formed JSON, and stores them as separate metadata records associated with the same asset.

Indexing Strategy

Both blobs are indexed, but for different query types. The workflow blob provides the graph topology — which node types were used, how they connect, what patterns the artist employed. This enables queries like “find all workflows that use ControlNet with depth preprocessing.” The prompt blob provides the execution reality — which specific models, seeds, and resolved parameters produced the output. This enables queries like “find all images generated with this exact checkpoint file.”

Lineage Linking

The relationship between the two blobs is itself valuable metadata. When an artist modifies a workflow and re-generates, the new output has a different prompt blob (new execution parameters) but a similar workflow blob (same graph structure with modifications). Tracking these relationships enables lineage queries: “show me how this workflow evolved across generations” or “what changed between this image and its predecessor?”

Cross-referencing also catches errors. If the workflow blob references a model that the prompt blob shows was never executed, something went wrong between design and execution. If the prompt blob contains a seed that the workflow blob shows was supposed to be randomized, the system can flag this as a potential reproducibility concern.

Consequences

  • Storage overhead: Storing both blobs per asset roughly doubles the metadata storage compared to extracting only one. For large libraries, this is significant — 100,000 assets with 200 KB each equals 20 GB of metadata alone. But the alternative — losing half the lineage information — is worse.
  • Extraction complexity: The extractor must handle ComfyUI's specific PNG text chunk format, large JSON payloads, and the occasional malformed output from custom nodes that inject non-standard data. Robust error handling that extracts what it can and flags what it cannot is essential.
  • Richer search: Dual indexing enables a wider range of queries than either blob alone supports. Users can search by intent (workflow patterns) and by outcome (execution parameters) depending on what they need to find.
  • True reproducibility: By preserving the prompt blob, the system enables exact replay of generations. By preserving the workflow blob, it enables iterative refinement. Both are essential for workflow reproducibility.

Related Patterns

  • The Two Metadata Problem describes the cross-tool dimension of metadata divergence; this article addresses the within-tool dimension.
  • Lineage Harder Than Git explains why tracking creative provenance across generations requires more than version control.
  • Cross-Tool Provenance extends the lineage challenge beyond ComfyUI to multi-tool creative workflows.
  • Midjourney Metadata contrasts ComfyUI's rich embedded metadata with Midjourney's opaque approach.

Every Node. Every Parameter. Automatically Indexed.

Numonic extracts both JSON structures from every ComfyUI output, making your workflow history searchable and your generations reproducible.

Try Numonic Free