CLIP Text Encode (Prompt) — ComfyUI Node Reference

What It Does

CLIPTextEncode takes raw text input and a CLIP model, then produces conditioning vectors that the sampler uses to steer image generation toward the described content. The text prompt is the most human-readable piece of provenance data.

In practice, most workflows use two CLIPTextEncode nodes: one for the positive prompt (what to generate) and one for the negative prompt (what to avoid). The prompt text, including any emphasis syntax (parentheses, brackets, attention weights), is fully captured in the workflow JSON.

Prompt text is often the first thing users search for when looking for specific generations, making it a high-value metadata field for asset management.

Inputs

clipCLIP

CLIP text encoder model.

textSTRING

The prompt text to encode.

Outputs

CONDITIONINGCONDITIONING

Encoded conditioning vectors.

What Numonic Captures

Full prompt text (positive and negative)
Emphasis/weighting syntax preserved verbatim
LoRA trigger words when present in prompt

Known Gaps

Effective token count after truncation
CLIP skip level (set elsewhere in the pipeline)

Related Nodes

KSamplerSampling

Load CheckpointLoaders

DualCLIPLoaderLoaders

Back to ComfyUI Nodes

Capture ComfyUI metadata automatically

Numonic extracts workflow metadata from every ComfyUI generation — models, samplers, seeds, prompts, and custom nodes. Track provenance, maintain compliance, and never lose a workflow.

Return to the Guide