Speech to Text to Document AI in Power Platform | Whisper AI & GPT with Azure OpenAI
Science & Technology
Introduction
In this article, we will explore how to leverage the power of AI and Microsoft Power Platform to convert speech into a document. Specifically, we will utilize OpenAI's Whisper API for converting speech to text and Azure OpenAI's GPT action for generating content based on the speech-to-text conversion. Let’s dive into the process!
Setting Up the Whisper API
OpenAI has announced the Whisper model, which excels in speech-to-text transcription and translation. This capability can be accessed via two specific API endpoints. To utilize speech-to-text capabilities in PowerApps, we will create a custom connector by following these steps:
- Create a New Custom Connector: From the Power Platform, select the option to create a new connector from blank and assign it a name.
- Upload an Icon: Choose an appropriate icon for your connector.
- Set Host and Security: The host will be
api.openai.com
withAPI key
security. Define the parameter name asauthorization
. - Define the Action: Create a new action and set an operation ID called
speech_to_text
. Import from sample with a POST request to the URL/v1/audio/transcriptions
. Set the header to acceptmultipart/form-data
. - Update Swagger Definition: Insert parameters related to form data and define the response format as JSON.
- Create the Connector: After completing the Swagger definition, create the connector.
Implementing in PowerApps
Once the custom connector is created, you can utilize it within PowerApps to create a new canvas app:
- Create a New App: Open PowerApps, create a new blank canvas app, and provide a name.
- Add the Custom Connector: Navigate to the data section and add your newly created custom connector, entering your API key in the format
Bearer <API Key>
. - Insert Microphone Control: Use the Microphone control to record audio, and add a button to initiate the speech-to-text conversion.
- Call Whisper API: On the button's select event, call the Whisper API using the connector and extract text from the audio recording.
To display the transcribed text, add a label control with the text property linked to the response returned from the Whisper API.
Generating Document with Azure OpenAI GPT
To create a document from the transcribed text, we'll leverage the Azure OpenAI services:
- Create a Flow with Power Automate: Design a new flow and start from blank, removing the trigger action.
- Add Input Attributes: Include an attribute for text and another for email.
- Use AI Builder's GPT Action: Select Azure OpenAI's GPT action and opt for the "create blog post" template. Adjust the pre-defined instructions.
- File Creation: Create a file in OneDrive with the generated blog post content and set it to save as an HTML file.
- Email with Attachment: Finally, set up a mechanism to send an email with the generated document as an attachment back to the user.
Testing the Application
To test the application, record a short audio snippet and click the button. The API will transcribe the speech into text, which will then be used to generate a document with GPT. Once the document is prepared, it will be sent as a PDF to the specified email.
Summary of Use Cases
You can explore various use cases, such as creating blog posts, talking points from meeting transcripts, or generating multimedia materials based on audio inputs. Additionally, leveraging AI models like DALL-E for generating images from speech input can expand the creative possibilities.
Keywords
- Whisper API
- Speech to Text
- Azure OpenAI
- Power Platform
- GPT
- Custom Connector
- Document Generation
- AI Builder
- PowerApps
FAQ
What is the Whisper API? The Whisper API is an OpenAI service that provides robust speech-to-text transcription and translation capabilities.
How do I create a custom connector in PowerApps? You can create a custom connector by defining the API host, setting security parameters, and establishing actions using the API specifications.
What can be generated using Azure OpenAI services? Azure OpenAI services can create text, answer questions, summarize documents, and generate various content types, including blog posts.
Can I send the generated document via email? Yes, the flow created can automatically send the generated document as an email attachment to the user.
What file formats can be generated with this process? The process allows for HTML file creation, which can be converted to PDF format upon email transmission.