I tried to build a ML Text to Image App with Stable Diffusion in 15 Minutes
Science & Technology
Introduction
In this article, I'll share my experience of building a text-to-image generation app using Stable Diffusion and Tkinter within a tight 15-minute time limit. This ambitious project involves creating an application where you can type in a prompt and receive an image generated through machine learning techniques.
Getting Started
To start our project, we first created a new Python file named app.py
. The necessary libraries were then imported, which included:
tkinter
for creating the GUIcustomtkinter
for enhanced Tkinter functionalityPillow
to handle image rendering- Hugging Face's
transformers
library for interacting with pre-trained models torch
for using PyTorch's machine learning capabilitiesdiffusers
for the Stable Diffusion model
The following code snippet showcases the initial setup:
import tkinter as tk
import customtkinter as ctk
from PIL import ImageTk
from huggingface_hub import hf_hub_download
import torch
from diffusers import StableDiffusionPipeline
Next, we initialized our application window:
app = tk.Tk()
app.geometry("532x622")
app.title("Stable Bud")
ctk.set_appearance_mode("dark")
After setting up the window, I created an entry field for user prompts, a button to trigger image generation, and a placeholder label to display the generated images.
Setting Up the User Interface
The entry field allows users to input their desired prompts to generate images. The button is labeled "Generate" and, when clicked, will trigger the image generation process. The label acts as a placeholder for displaying the resulting image generated by the Stable Diffusion model.
Here is a quick overview of how I set up these components:
prompt = ctk.CTkEntry(app, height=40, width=512, text_font=("Arial", 20), placeholder_text_color="black", fg_color="white")
prompt.place(x=10, y=10)
trigger = ctk.CTkButton(app, height=40, width=120, text="Generate", fg_color="blue", text_color="white", command=generate)
trigger.place(x=206, y=60)
lMain = ctk.CTkLabel(app, height=512, width=512)
lMain.place(x=10, y=110)
Generating Images with Stable Diffusion
The next step involved defining the generate
function to handle the image generation process using the Stable Diffusion model. This was done by specifying the model's ID and creating a pipeline for the image generation.
Here's how I defined the image generation function:
def generate():
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16).to("cuda")
with torch.autocast("cuda"):
image = pipe(prompt.get(), guidance_scale=7.5).images[0]
image.save("generated_image.png")
lMain.configure(image=ImageTk.PhotoImage(image))
Testing and Final Touches
After implementing the core functionality, I tested the app by running it and entering various prompts. Despite some initial hiccups with GPU memory issues, I managed to generate images successfully within the time limit. The final product allows users to input prompts and receive visually appealing images generated by the Stable Diffusion model.
With less than two minutes left, I was able to achieve my goal of building the app within 15 minutes—an exhilarating challenge that showcased modern machine learning capabilities through an easy-to-use interface.
Conclusion
Building a text-to-image generation application using Stable Diffusion in such a short time frame was both challenging and rewarding. The power of machine learning and the ease of creating GUIs with Tkinter make this a fantastic project for anyone interested in artificial intelligence and software development. I encourage you to give it a try, explore what you can create, and innovate on top of existing technologies.
Keyword
- Stable Diffusion
- Text-to-image generation
- Tkinter
- CustomTkinter
- Machine learning
- Hugging Face
- PyTorch
- Image rendering
FAQ
What is Stable Diffusion?
Stable Diffusion is a deep learning model that generates images from textual descriptions.
What libraries are required to build this app?
You need tkinter
, customtkinter
, Pillow
, transformers
, torch
, and diffusers
.
How can I run the app?
You can run the app by executing the python app.py
command in your terminal.
What types of prompts can I use?
You can use any textual description, such as "a spaceship landing on Mars" or "a 3D Charizard in a forest."
Is this application resource-intensive?
Yes, using Stable Diffusion can be resource-intensive, requiring a GPU with sufficient VRAM.