How to use Document AI

Introduction

In the evolving landscape of document processing, Google Cloud's Document AI offers a powerful set of tools to extract structured data from unstructured documents. This guide will walk you through how to get started using Document AI, including setting up processors, processing documents using the API, and understanding the output structure.

Introduction to Document AI Processors

Document AI processors act as an interface between document files and machine learning models. These processors are specialized tools designed for tasks such as document classification, parsing, and analysis. To utilize Document AI, you must create processor instances in your Google Cloud project. This approach differs from other ML APIs, like those for Vision or Speech-to-Text; instead, it resembles creating a VM in Compute Engine or a model instance in Vertex AI.

Document AI processors can be categorized into three types:

General Processors: Used for general tasks like form parsing.
Specialized Processors: Designed for specific document types.
Custom Processors: Tailored to unique processing needs.

Setting Up a Processor in Cloud Console

To create and test a processor, follow these steps:

Navigate to Artificial Intelligence and select Document AI.
Click on Create Processor.
If prompted, enable the Document AI API.
Choose your processor type, in this case, Document OCR.
Specify a processor name and a region (note that these cannot be changed later).
Click Create to finish the setup.

Once created, note the processor ID and project ID—you'll need both to access the API.

Uploading Test Documents

After setting up your processor, you can upload test documents. The console will display the extracted text in various sections, allowing you to quickly verify the processing results.

Using the Document AI API

To automate document processing, you can leverage the Document AI API. It supports a unified API endpoint, which simplifies development. The endpoint requires the processor ID and project ID to specify your processor type in the request.

Online vs. Batch Processing

Document AI supports two types of processing:

Online Processing: Ideal for low-latency use cases where immediate results are necessary.
Batch Processing: Suitable for larger files or multiple documents processed at once. This method operates asynchronously and uses Google Cloud Storage to store input and output files.

Setting Up Your Application

Create a Service Account: Grant it the Document AI API user role.
Install the Client Library: For this guide, we’ll use the Python client library.

Making API Calls

For both online and batch processing, you will define variables such as:

Project ID
Location
Processor ID
Path to a local file
MIME type of the file

The request configuration includes the base64 encoded file and processor information. After making a request, you can analyze the output from the API through the returned document object.

Understanding the Document Object Structure

The document object contains essential information extracted during processing, including:

Document Representation: Raw text and physical layout.
Extracted Data: Structured data stored in an array (entities field).
Metadata: Full revision history of document changes.
Annotations: Used for Human-in-the-Loop (HITL) processes to correct predictions.

The document object provides valuable insights into the processing results, including page layouts and text extraction.

Conclusion

In this article, we covered the fundamental aspects of Google Cloud's Document AI, including processor types, creation steps, document processing options, and the structure of the output document object. For detailed tutorials on making API calls, you can refer to the documentation.

Keywords

Document AI
Processors
Online Processing
Batch Processing
Document Object
API
Machine Learning

FAQ

Q1: What is Document AI?
A1: Document AI is a Google Cloud service that employs machine learning to extract structured data from unstructured documents.

Q2: How do I create a processor in Document AI?
A2: You can create a processor in the Google Cloud Console under the Document AI section by selecting the processor type, naming it, and specifying the region.

Q3: What types of processors are available in Document AI?
A3: Document AI offers general, specialized, and custom processors for various document processing tasks.

Q4: What is the difference between online and batch processing?
A4: Online processing is for real-time document analysis, while batch processing is designed for handling larger files and multiple documents asynchronously.

Q5: How do I access the extracted data from a processed document?
A5: The extracted data can be accessed through the document object returned by the API, which includes various fields detailing the text, layout, and metadata.