ad
ad
Topview AI logo

I tried to build a ML Text to Image App with Stable Diffusion in 15 Minutes

Science & Technology


Introduction

In this article, I'll share my experience of building a text-to-image generation app using Stable Diffusion and Tkinter within a tight 15-minute time limit. This ambitious project involves creating an application where you can type in a prompt and receive an image generated through machine learning techniques.

Getting Started

To start our project, we first created a new Python file named app.py. The necessary libraries were then imported, which included:

  • tkinter for creating the GUI
  • customtkinter for enhanced Tkinter functionality
  • Pillow to handle image rendering
  • Hugging Face's transformers library for interacting with pre-trained models
  • torch for using PyTorch's machine learning capabilities
  • diffusers for the Stable Diffusion model

The following code snippet showcases the initial setup:

import tkinter as tk
import customtkinter as ctk
from PIL import ImageTk
from huggingface_hub import hf_hub_download
import torch
from diffusers import StableDiffusionPipeline

Next, we initialized our application window:

app = tk.Tk()
app.geometry("532x622")
app.title("Stable Bud")
ctk.set_appearance_mode("dark")

After setting up the window, I created an entry field for user prompts, a button to trigger image generation, and a placeholder label to display the generated images.

Setting Up the User Interface

The entry field allows users to input their desired prompts to generate images. The button is labeled "Generate" and, when clicked, will trigger the image generation process. The label acts as a placeholder for displaying the resulting image generated by the Stable Diffusion model.

Here is a quick overview of how I set up these components:

prompt = ctk.CTkEntry(app, height=40, width=512, text_font=("Arial", 20), placeholder_text_color="black", fg_color="white")
prompt.place(x=10, y=10)

trigger = ctk.CTkButton(app, height=40, width=120, text="Generate", fg_color="blue", text_color="white", command=generate)
trigger.place(x=206, y=60)

lMain = ctk.CTkLabel(app, height=512, width=512)
lMain.place(x=10, y=110)

Generating Images with Stable Diffusion

The next step involved defining the generate function to handle the image generation process using the Stable Diffusion model. This was done by specifying the model's ID and creating a pipeline for the image generation.

Here's how I defined the image generation function:

def generate():
    model_id = "CompVis/stable-diffusion-v1-4"
    pipe = StableDiffusionPipeline.from_pretrained(model_id, revision="fp16", torch_dtype=torch.float16).to("cuda")
    with torch.autocast("cuda"):
        image = pipe(prompt.get(), guidance_scale=7.5).images[0]
    image.save("generated_image.png")
    lMain.configure(image=ImageTk.PhotoImage(image))

Testing and Final Touches

After implementing the core functionality, I tested the app by running it and entering various prompts. Despite some initial hiccups with GPU memory issues, I managed to generate images successfully within the time limit. The final product allows users to input prompts and receive visually appealing images generated by the Stable Diffusion model.

With less than two minutes left, I was able to achieve my goal of building the app within 15 minutes—an exhilarating challenge that showcased modern machine learning capabilities through an easy-to-use interface.

Conclusion

Building a text-to-image generation application using Stable Diffusion in such a short time frame was both challenging and rewarding. The power of machine learning and the ease of creating GUIs with Tkinter make this a fantastic project for anyone interested in artificial intelligence and software development. I encourage you to give it a try, explore what you can create, and innovate on top of existing technologies.


Keyword

  • Stable Diffusion
  • Text-to-image generation
  • Tkinter
  • CustomTkinter
  • Machine learning
  • Hugging Face
  • PyTorch
  • Image rendering

FAQ

What is Stable Diffusion?
Stable Diffusion is a deep learning model that generates images from textual descriptions.

What libraries are required to build this app?
You need tkinter, customtkinter, Pillow, transformers, torch, and diffusers.

How can I run the app?
You can run the app by executing the python app.py command in your terminal.

What types of prompts can I use?
You can use any textual description, such as "a spaceship landing on Mars" or "a 3D Charizard in a forest."

Is this application resource-intensive?
Yes, using Stable Diffusion can be resource-intensive, requiring a GPU with sufficient VRAM.

ad

Share

linkedin icon
twitter icon
facebook icon
email icon
ad