Insurance Quote Documents: Key Value Extraction using AI
Science & Technology
Introduction
In this article, we explore how the Oracle Cloud Infrastructure (OCI) Document Understanding Service can be leveraged to extract specific key-value pairs from insurance quote documents for commercial buildings. These documents are typically exchanged between insurance providers and insurance brokers, containing critical information such as the name insured, policy dates, limits, premiums, and other metrics.
Understanding the Document Layout
Insurance quote documents consist of various fields including:
- Name Insured
- Policy Start Date
- Policy End Date
- Each Occurrence Limit
- Aggregate Limit
- Annual Premium
- Terrorism Premium
- Total Premium
- Commission Rate
For optimal results, it is essential to work closely with stakeholders, in this case, the insurance brokers. They can provide clarity on which key-value pairs are most relevant for extraction. In this instance, nine key-value pairs were highlighted for the model to focus on.
Labeling Data with OCI Data Labeling Service
To train a custom key-value extraction model, we first need to label the training documents. Using the OCI Data Labeling Service, we can easily annotate our documents by drawing bounding boxes around the relevant text. For instance, for the "Name Insured," we can select both the first and last names to create accurate labels.
The labeling service employs Optical Character Recognition (OCR) to accurately identify characters and their corresponding bounding boxes, providing us with a confidence score for each word.
Once all the documents are labeled, the model is ready for training. A total of ten documents were utilized to create a dataset for the custom model, which can include various values such as the limits and premiums.
Training the Key-Value Extraction Model
After labeling, we proceed to train the custom key-value extraction model using the Document Understanding Service. The parameters include selecting the type as "Key Value Extraction," choosing the existing labeled dataset, and providing a name for the model. After defining a recommended training duration, the model begins training, which can take up to 24 hours, although a small dataset typically results in quicker training—around 20 minutes in this case.
Upon completion of the training process, we observe results showing precision, recall, and accuracy metrics. Although the initial training set consisted of only ten documents, the model delivered approximately 78% accuracy, extracting seven out of nine key-value pairs correctly.
Making Predictions with the Trained Model
Once the model is trained and deployed, it can be utilized to make predictions on new insurance quote documents. By using the OCI SDK, we can easily upload new documents and retrieve the extracted key-value pairs via an API call.
For demonstration purposes, we preview a new document with slightly different values but a similar structure to the training documents. Upon submission, the system processes the uploaded insurance quote and successfully extracts the nine anticipated key-value pairs, including the Name Insured, Policy Start Date, Premium values, and Commission Rate.
While the model performed well overall, a minor character was missed in the rating field. With further training on additional documents, accuracy in capturing key-value pairs is expected to improve significantly.
Conclusion
Through this process, we demonstrated how to utilize the OCI Document Understanding Service to build a custom key-value extraction model for insurance quote documents. This automation facilitates better data handling and analysis in the insurance domain.
Keyword
Extracting Key-Value Pairs, OCI Document Understanding Service, Insurance Quote Documents, Data Labeling, Custom Model Training, API Call, Machine Learning, Optical Character Recognition, Accuracy Metrics.
FAQ
1. What is the OCI Document Understanding Service?
The OCI Document Understanding Service is a cloud-based service that allows users to analyze and extract structured data from unstructured documents using machine learning algorithms.
2. How do you prepare documents for key-value extraction?
Documents are prepared by labeling key-value pairs using the OCI Data Labeling Service, which allows users to annotate relevant text with bounding boxes.
3. How long does it take to train a key-value extraction model?
The training duration can vary; however, for small datasets, training can complete in approximately 20 minutes to a few hours, depending on document complexity.
4. What is the expected accuracy of the trained model?
Accuracy may vary based on the training dataset size and quality, but in the demonstrated case, the trained model achieved around 78% accuracy.
5. Can the model process new documents?
Yes, once trained, the model can be used to process and extract key-value pairs from new documents via API calls.