How to Generate Images from Text?
The Stable Diffusion model is a huge framework that requires us to write very lengthy code to generate an image from a text prompt. However, HuggingFace has introduced Diffusers to overcome this challenge. With Diffusers, we can easily generate numerous images by simply writing a few lines of Python code, without the need to worry about the architecture behind it. In our case, we will be utilizing the cutting-edge StableDiffusionPipeline provided by the Diffusers library. This helps in generating an image from a text prompt with only a few lines of Python code.
Requirements
- Diffusers: This is the main package we require to run the inference on the model.
pip install diffusers
- transformers: This package is required for encoding and decoding purposes.
pip install transformers
- Pillow: This package is used for Image processing.
pip install Pillow
- accelerate, scipy, and safetensors: These packages are required to run the model on our computer.
pip install accelerate scipy safetensors
Note: Use a virtual environment if you are running this project on your local machine to avoid any installation errors. Skip this if you are using Colab. It’s better to use Google Colab to run this model, as it requires a lot of CPU and GPU resources of your system to complete the processing.
Versions of Diffusion
Some of the popular Stable Diffusion Text-to-Image model versions are:
- Stable Diffusion v1 – The base model that is the start of image generation.
- Stable Diffusion v1.5 – Larger Image qualities and support for larger image sizes (up to 1024×1024).
- Stable Diffusion v2 – Improvements to image quality, conditioning, and generation speed are made.
- Stable Diffusion 2.1 – Optimized for speed with AI Template and supports all input shapes up to 1024×1024.
- Stable Diffusion XL 1.0 – Large language model with 1.28B parameters, trained on a huge dataset of text and images, can generate images from text descriptions. Can generate images at higher resolutions (up to 2048×2048) with improved image quality.
The better version the slower inference time and great image quality and results to the given prompt.
In this article, we will be using the stabilityai/stable-diffusion-2-1 model for generating images. stable-diffusion-2-1 model is fine-tuned from stable-diffusion-2. Stable Diffusion 2 is way better than Stable Diffusion 1 with improved image quality and is more realistic.
Generating Image
Here is the Python code to run the model which generates the image as output. If you are using Google Colab, change the runtime to T4 which is GPU with a high amount of RAM.
import torch
from diffusers import StableDiffusionPipeline
from PIL import Image
# Replace the model version with your required version if needed
pipeline = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16
)
# Running the inference on GPU with cuda enabled
pipeline = pipeline.to('cuda')
prompt = "Your prompt here"
image = pipeline(prompt=prompt).images[0]
This code will generate a Pillow Image as output and is stored in the “image” variable that is accessed later.
Note: When the above code is run for the first time, a few pytorch models, and safe sensors of size up to 5GB will be installed on your computer. Afterwards, it will just use those.
Displaying the Image
If you are running locally, use the following code to display the image.
image.show()
If you are running on Google Colab, use the following code to display the image.
from IPython.display import display
display(image)
Prompt: Photograph of a horse on a highway road at sunset.
Output:
Generate Images from Text in Python – Stable Diffusion
Looking for the images can be quite a hassle don’t you think? Guess what? AI is here to make it much easier! Just imagine telling your computer what kind of picture you’re looking for and voila it generates it for you. That’s where Stable Diffusion, in Python, comes into play. It’s like magic – transforming words into visuals. In this article, we’ll explore how you can utilize Diffusion in Python to discover and craft stunning images. It’s, like having an artist right at your fingertips!