Azure AI Document Intelligence - Custom Extraction
Science & Technology
Introduction
Hello everyone! My name is Emerson, and welcome to the Cloud Know Ship Channel, where I aim to share my cloud knowledge with you. If you are interested in working with cloud technology, I encourage you to subscribe to my channel and hit the bell notification icon to stay updated on my latest videos. You can also connect with me via LinkedIn to discuss your cloud journey—I'm here to help!
Introduction
In today’s article, we will explore custom models in Azure AI Document Intelligence, specifically focusing on custom extraction models. We will guide you through the process of creating and training such models using the Document Intelligence Studio.
Getting Started
To begin, open the Document Intelligence Studio. Click on "Create" to start the project setup. You’ll be presented with some instructional videos that provide an overview of Azure’s custom extraction models. After watching the videos, proceed to create your project.
- Enter the project name, such as “Custom Extraction 01” (leave the description blank), and click on "Continue."
- You will need to select your Azure subscription, resource group, and specific resources where Azure Document Intelligence is located.
- Next, select the API version for the model. Options available include both general availability and preview versions.
Once these selections are made, continue to choose the storage account you plan to use for training the model, including the blob container. Create a specific folder within the blob container for your custom extraction model.
Uploading Documents
With the project created, navigate to the section for uploading documents. This is where you upload the files you intend to train your model. For this demonstration, I will upload various lease contracts.
- Select the required documents for training. You can view and zoom in on each document to check the information they contain, including the necessary fields.
- Using the "Run Layout" function, the AI will extract all text available within the documents. After this, you can use "Auto Label" to automatically label the extracted data.
When selecting the auto-labeling, you must choose a pre-trained model suitable for the type of documents you are dealing with—options include invoice, receipt, or insurance card models.
Creating Fields for Extraction
For custom extraction, you'll need to define the fields you wish to extract from your documents:
- Create a new text field to denote the landlord’s name, and mark it accordingly.
- Add another text field for the tenant's name.
- Include a field for the contract name as well.
Repeat this process for all documents in your training set.
Training the Model
Once all fields are marked, it’s time to train your model. Click on the "Train" option and name the training session, e.g., “Lizzy Contract 001.” You’ll also have the choice between neural and template-based training. Since our documents vary significantly, we’ll utilize neural training.
After hitting "Train," the model will begin training, which may take some minutes. Once training is completed, you can proceed to test the model.
Testing the Model
Testing involves uploading a new document that was not part of the training set. Run the analysis, and the AI will extract the relevant information while matching it against the fields you defined earlier.
You should see results with extracted names and a confidence score indicating the accuracy of the findings. The JSON output will display all selected fields and their corresponding confidence levels.
Utilizing the Model
To integrate this model into your application, you can access code snippets for different programming languages directly from the interface. This allows for easy implementation within your own applications without having to start from scratch.
You can also adjust project settings as needed, including changing the Azure Document Intelligence resource and updating folder paths if required.
Conclusion
This article has covered the steps to create and train a custom extraction model using Azure AI Document Intelligence. If you have any questions or would like to see more content regarding Azure’s AI capabilities, don’t hesitate to reach out!
Keywords
- Azure AI
- Document Intelligence
- Custom Extraction Model
- Blob Container
- Training
- Auto Label
- Neural Training
- Fields Extraction
- JSON Output
FAQ
Q1: What is Azure AI Document Intelligence?
Azure AI Document Intelligence is a service that enables you to analyze and extract information from documents using machine learning.
Q2: How do I create a custom extraction model on Azure?
To create a custom extraction model, sign in to Document Intelligence Studio, create a new project, upload your training documents, define the fields for extraction, and then train your model.
Q3: Can I use pre-trained models for automatic labeling?
Yes, you can utilize pre-trained models provided by Microsoft, such as those for invoices or receipts, for auto-labeling your documents.
Q4: What kind of documents can be used to train the model?
You can train the model using various document types, as long as they contain the information relevant to your defined fields.
Q5: How can I integrate the trained model into my application?
The Document Intelligence Studio provides code snippets to facilitate the integration of the trained model into your applications, allowing you to make adjustments as necessary.