What It Does
CLIPTextEncode takes raw text input and a CLIP model, then produces conditioning vectors that the sampler uses to steer image generation toward the described content. The text prompt is the most human-readable piece of provenance data.
In practice, most workflows use two CLIPTextEncode nodes: one for the positive prompt (what to generate) and one for the negative prompt (what to avoid). The prompt text, including any emphasis syntax (parentheses, brackets, attention weights), is fully captured in the workflow JSON.
Prompt text is often the first thing users search for when looking for specific generations, making it a high-value metadata field for asset management.
Inputs
clipCLIPCLIP text encoder model.
textSTRINGThe prompt text to encode.
Outputs
CONDITIONINGCONDITIONINGEncoded conditioning vectors.
What Numonic Captures
- Full prompt text (positive and negative)
- Emphasis/weighting syntax preserved verbatim
- LoRA trigger words when present in prompt
Known Gaps
- Effective token count after truncation
- CLIP skip level (set elsewhere in the pipeline)
Related Nodes
Capture ComfyUI metadata automatically
Numonic extracts workflow metadata from every ComfyUI generation — models, samplers, seeds, prompts, and custom nodes. Track provenance, maintain compliance, and never lose a workflow.