Approach-2 Using AutoPipelineForText2Image
In order to use task-oriented pipeline, Diffusers also provide AutoPipeline, where we have more flexibility in running inference by enabling the use_safetensors to directly load weights. By automatically identifying the appropriate pipeline class, the AutoPipeline eliminates the need to know the exact class name, simplifying the process of loading a checkpoint for a given task.
Import required Libraries
Python3
import torch from diffusers import AutoPipelineForText2Image |
Create Auto Pipeline for Text to Image
The syntax is similar as approach-1, but here we also define use_safetensors to be True and variant to run on floating point 16-bit precision. Notice one change, here we are using the Stabel Diffusion XL pre-trained model, which is the most advanced model in the current date.
Python3
pipe = AutoPipelineForText2Image.from_pretrained( "stabilityai/stable-diffusion-xl-base-1.0" , torch_dtype = torch.float16, variant = "fp16" , use_safetensors = True ) pipe = pipe.to( "cuda" ) |
Define prompt and run Pipeline
Use the same prompt and check the response quality between the base model (v1.5) and advanced model (xl).
Python3
prompt = "a horse racing near beach, 8k, realistic photography" image = pipe(prompt = prompt).images[ 0 ] image |
Output:
Stable Diffusion XL gives more accurate result compared to Stable Diffusion v1.5 as in prompt we mentioned beach, but v1.5 doesn’t have beach in its image. With this we conclude.
Build Text To Image with HuggingFace Diffusers
This article will implement the Text 2 Image application using the Hugging Face Diffusers library. We will demonstrate two different pipelines with 2 different pre-trained Stable Diffusion models. Before we dive into code implementation, let us understand Stable Diffusion.