I tried to build a ML Text to Image App with Stable Diffusion in 15 Minutes

Introduction

In this article, I'll share my experience of building a text-to-image generation app using Stable Diffusion and Tkinter within a tight 15-minute time limit. This ambitious project involves creating an application where you can type in a prompt and receive an image generated through machine learning techniques.

Getting Started

To start our project, we first created a new Python file named app.py. The necessary libraries were then imported, which included:

tkinter for creating the GUI
customtkinter for enhanced Tkinter functionality
Pillow to handle image rendering
Hugging Face's transformers library for interacting with pre-trained models
torch for using PyTorch's machine learning capabilities
diffusers for the Stable Diffusion model

The following code snippet showcases the initial setup:

import tkinter as tk
import customtkinter as ctk
from PIL import ImageTk
from huggingface_hub import hf_hub_download
import torch
from diffusers import StableDiffusionPipeline

Next, we initialized our application window:

app = tk.Tk()
app.geometry("532x622")
app.title("Stable Bud")
ctk.set_appearance_mode("dark")

After setting up the window, I created an entry field for user prompts, a button to trigger image generation, and a placeholder label to display the generated images.

Setting Up the User Interface

The entry field allows users to input their desired prompts to generate images. The button is labeled "Generate" and, when clicked, will trigger the image generation process. The label acts as a placeholder for displaying the resulting image generated by the Stable Diffusion model.

Here is a quick overview of how I set up these components:

prompt = ctk.CTkEntry(app, height=40, width=512, text_font=("Arial", 20), placeholder_text_color="black", fg_color="white")
prompt.place(x=10, y=10)

trigger = ctk.CTkButton(app, height=40, width=120, text="Generate", fg_color="blue", text_color="white", command=generate)
trigger.place(x=206, y=60)

lMain = ctk.CTkLabel(app, height=512, width=512)
lMain.place(x=10, y=110)

Generating Images with Stable Diffusion

The next step involved defining the generate function to handle the image generation process using the Stable Diffusion model. This was done by specifying the model's ID and creating a pipeline for the image generation.

Here's how I defined the image generation function:

def generate():
    model_id = "CompVis/stable-diffusion-v1-4"
    pipe = StableDiffusionPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16).to("cuda")
    with torch.autocast("cuda"):
        image = pipe(prompt.get(), guidance_scale=7.5).images[0]
    image.save("generated_image.png")
    lMain.configure(image=ImageTk.PhotoImage(image))

Testing and Final Touches

After implementing the core functionality, I tested the app by running it and entering various prompts. Despite some initial hiccups with GPU memory issues, I managed to generate images successfully within the time limit. The final product allows users to input prompts and receive visually appealing images generated by the Stable Diffusion model.

With less than two minutes left, I was able to achieve my goal of building the app within 15 minutes—an exhilarating challenge that showcased modern machine learning capabilities through an easy-to-use interface.

Conclusion

Building a text-to-image generation application using Stable Diffusion in such a short time frame was both challenging and rewarding. The power of machine learning and the ease of creating GUIs with Tkinter make this a fantastic project for anyone interested in artificial intelligence and software development. I encourage you to give it a try, explore what you can create, and innovate on top of existing technologies.

Keyword

Stable Diffusion
Text-to-image generation
Tkinter
CustomTkinter
Machine learning
Hugging Face
PyTorch
Image rendering

FAQ

What is Stable Diffusion?
Stable Diffusion is a deep learning model that generates images from textual descriptions.

What libraries are required to build this app?
You need tkinter, customtkinter, Pillow, transformers, torch, and diffusers.

How can I run the app?
You can run the app by executing the python app.py command in your terminal.

What types of prompts can I use?
You can use any textual description, such as "a spaceship landing on Mars" or "a 3D Charizard in a forest."

Is this application resource-intensive?
Yes, using Stable Diffusion can be resource-intensive, requiring a GPU with sufficient VRAM.