Checkpoints
Enable the Document AI API.
/ 20
Configure your Vertex AI Workbench instance
/ 20
Make a synchronous process document request.
/ 20
Prepare your environment for asynchronous API calls
/ 20
Make an asynchronous process document request
/ 20
Process Documents with Python Using the Document AI API
- GSP925
- Overview
- Setup and requirements
- Task 1. Create and test a general form processor
- Task 2. Configure your Vertex AI Workbench instance to perform Document AI API calls
- Task 3. Make a synchronous process document request
- Task 4. Run the synchronous Document AI Python code
- Task 5. Create a Document AI Document OCR processor
- Task 6. Prepare your environment for asynchronous Document AI API calls
- Task 7. Make an asynchronous process document request
- Congratulations
GSP925
Overview
The Document AI API is a document understanding solution that takes unstructured data, such as documents and emails, and makes the data easier to understand, analyze, and consume.
In this lab, you will use the Document AI API with Python to create various processors, including a general form processor and a Document OCR processor, then make synchronous and asynchronous calls to the API using Python. This lab creates a Vertex AI Workbench instance for you that you will use with JupyterLab notebooks to work with the Document AI Python Client modules.
Objectives
In this lab you will learn how to:
- Enable the Document AI API and create processors.
- Install the client library for Python in a Vertex AI Workbench instance.
- Parse data from a scanned form using Python to make a synchronous API call.
- Parse data from scanned forms using Python to make an asynchronous API call.
Setup and requirements
Before you click the Start Lab button
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
- Time to complete the lab---remember, once you start, you cannot pause a lab.
Activate Cloud Shell
Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.
- Click Activate Cloud Shell at the top of the Google Cloud console.
When you are connected, you are already authenticated, and the project is set to your Project_ID,
gcloud
is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
- (Optional) You can list the active account name with this command:
- Click Authorize.
Output:
- (Optional) You can list the project ID with this command:
Output:
gcloud
, in Google Cloud, refer to the gcloud CLI overview guide.
Task 1. Create and test a general form processor
In this task you will enable the Document AI API and create and test a general form processor. The general form processor will process any type of document and extract all the text content it can identify in the document. It is not limited to printed text, it can handle handwritten text and text in any orientation, supports a number of languages, and understands how form data elements are related to each other so that you can extract key:value pairs for form fields that have text labels.
Enable the Cloud Document AI API
Before you can begin using Document AI, you must enable the API.
-
In Cloud Console, from the Navigation menu (), click APIs & services > Library.
-
Search for Cloud Document AI API, then click the Enable button to use the API in your Google Cloud project.
If the Cloud Document AI API is already enabled you will see the Manage button and you can continue with the rest of the lab.
Create a general form processor
Create a Document AI processor using the Document AI form parser.
-
In the console, on the Navigation menu (), click Document AI > Overview.
-
Click Explore processor and select Form Parser, which is a type of general processor.
-
Specify the processor name as form-parser and select the region US (United States) from the list.
-
Click Create to create the general form-parser processor.
This will create the processor and return to the processor details page that will display the processor ID, status, and the prediction endpoint.
- Make a note of the Processor ID as you will need to update variables in JupyterLab notebooks with the Processor ID in later tasks.
Task 2. Configure your Vertex AI Workbench instance to perform Document AI API calls
Next, connect to JupyterLab running on the Vertex AI Workbench instance that was created for you when the lab was started, then configure that environment for the remaining lab tasks.
-
In the Google Cloud console, on the Navigation menu (), click Vertex AI > Workbench.
-
Find the
instance and click on the Open JupyterLab button.
The JupyterLab interface for your Workbench instance opens in a new browser tab.
-
Click Terminal to open a terminal shell inside the Vertex AI Workbench instance.
-
Enter the following command in the terminal shell to import the lab files into your Vertex AI Workbench instance:
- Enter the following command in the terminal shell to install the Python client libraries required for Document AI and other required libraries:
You should see output indicating that the libraries have been installed successfully.
- Enter the following command in the terminal shell to import the sample health intake form:
-
In the notebook interface open the JupyterLab notebook called
. -
In the Select Kernel dialog, choose Python 3 from the list of available kernels.
Task 3. Make a synchronous process document request
Make a process document call using a synchronous Document AI API call. For processing large amounts of documents at a time you can also use the asynchronous API which you will use in a later task.
Review the Python code for synchronous Document AI API calls
Take a minute to review the Python code in the
The first code block imports the required libraries and initializes some variables.
The Set your Processor ID code cell sets the Processor ID that you have to manually set before you can process documents with the notebook.
You will need the Document AI processor ID of the processor you created in Task 1 for this step.
Tip: If you did not save it, then in the Cloud Console tab open the Navigation menu (), click Document AI > My processors, then click the name of your processor to open the details page. From here you can copy the processor ID.
The Process Document Function code cell defines the process_document
function that is used to make a synchronous call to a Document AI processor. The function creates a Document AI API client object.
The processor name required by the API call is created using the project_id
,locations
, and processor_id
parameters and the sample PDF document is read in and stored in a mime_type
structure.
The function creates a request object that contains the full processor name of the document and uses that object as the parameter for a synchronous call to the Document AI API client. If the request is successful the document object that is returned will include properties that contain the entities detected in the form.
The Process Document code cell calls the process_document
function, saves the response in the document
variable, and prints the raw text that has been detected. All of the processors will report some data for the document.text
property.
The Get Text Function code cell defines the get_text()
function that retrieves the text for a named element using the text_anchor
start_index
and end_index
properties of the named element's text_segments
. This function is used to retrieve the form name and form value for form data if that data is returned by the processor.
The Display Form Data cell iterates over all pages that have been detected and for each form_field
detected it uses the get_text()
function to retrieve the field name and field value. Those values are then printed out, along with their corresponding confidence scores. Form data will be returned by processors that use the general form parser or the specialized parsers but will not be returned by processors that were created with the Document OCR parser.
The Display Entity Data cell extracts entity data from the document object and displays the entity type, value, and confidence properties for each entity detected. Entity data is only returned by processors that use specialized Document AI parsers such as the Procurement Expense parser. The general form parser and the Document OCR parser will not return entity data.
Task 4. Run the synchronous Document AI Python code
Execute the code to make synchronous calls to the Document AI API in the JupyterLab notebook.
-
In the second Set your Processor ID code cell replace the
PROCESSOR_ID
placeholder text with the Processor ID for the form-parser processor you created in an earlier step. -
Select the first cell, click the Run menu and then click Run Selected Cell and All Below to run all the code in the notebook.
If you have used the sample health intake form, you will data similar to the following for the output cell for the form data:
If you are able to create a specialised processor the final cell will display entity data, otherwise it will show an empty table.
- In the JupyterLab menu click File and then click Save Notebook to save your progress.
Task 5. Create a Document AI Document OCR processor
In this task you will create a Document AI processor using the general Document OCR parser.
-
From the Navigation menu, click Document AI > Overview.
-
Click Explore Processor and then click Create Processor for Document OCR. This is a type of general processor.
-
Specify the processor name as ocr-processor and select the region US (United States) from the list.
-
Click Create to create your processor.
-
Make a note of the processor ID. You will use need to specify this in a later task.
Task 6. Prepare your environment for asynchronous Document AI API calls
In this task you upload the sample JupyterLab notebook to test asynchronous Document AI API calls and copy some sample forms for the lab to Cloud Storage for asynchronous processing.
-
Click the Terminal tab to re-open the terminal shell inside the Vertex AI Workbench instance.
-
Create a Cloud Storage bucket for the input documents and copy the sample W2 forms into the bucket:
-
In the notebook interface open the JupyterLab notebook called
. -
In the Select Kernel dialog, choose Python 3 from the list of available kernels.
Task 7. Make an asynchronous process document request
Review the Python code for asynchronous Document AI API calls
Take a minute to review the Python code in the
The first code cell imports the required libraries.
The Set your Processor ID code cell sets the Processor ID that you have to manually set before you can process documents with the notebook.
The Set your variables code cell defines the parameters that will be used to make the asynchronous call, including the location of the input and output Cloud Storage buckets that will be used for the source data and output files. You will update the placeholder values in this cell for the PROJECT_ID
and the PROCESSOR_ID
in the next section of the lab before you run the code. The other variables contain defaults for the processor location, input Cloud Storage Bucket, and output Cloud Storage bucket that you do not need to change.
The Define Google Cloud client objects code cell initializes the Document AI and Cloud Storage clients.
The Create input configuration code cell creates the input configuration array parameter for the source data that will be passed to the asynchronous Document AI request as an input configuration. This array stores the Cloud Storage source location, and the mime type, for each of the files that are found in the input Cloud Storage location.
The Create output configuration code cell creates the output parameter for the asynchronous request containing the output Cloud Storage bucket location and stores that as a Document AI batch output configuration.
The Create the Document AI API request code cell builds the asynchronous Document AI batch process request object using the input and output configuration objects.
The Start the batch (asynchronous) API operation code cell makes an asynchronous document process request by passing the request object to the batch_process_documents()
method. This is an asynchronous call so you use the result()
method to force the notebook to wait until the background asynchronous job has completed.
The Fetch list of output files cell enumerates the objects in the output bucket location as defined in the destination_uri
variable.
The Display detected text from asynchronous output JSON files cell loads each output JSON file that is found as a Document AI document object and the text data detected by the Document OCR processor is printed out.
The Display entity data cell will display any entity data that is found, however, entity data is only available for processors that were created using a specialized parser. Entity data will not be displayed with the general Document AI OCR parser used in this task.
Run the asynchronous Document AI Python code
Use the sample code provided for you in the Jupyterlab notebook to process documents asynchronously using a Document AI batch processing request.
-
In the second code cell replace the
PROCESSOR_ID
placeholder text with the Processor ID for the form-parser processor you created in an earlier step. -
Select the first cell, click the Run menu and then click Run Selected Cell and All Below to run all the code in the notebook.
-
As the code cells execute, you can step through the notebook reviewing the code and the comments that explain how the asynchronous request object is created and used.
The notebook will take a minute or two to wait for the asynchronous batch process operation to complete at the Start the batch (asynchronous) API operation code cell. While the batch process API call itself is asynchronous the notebook uses the result
method to force the notebook to wait until the asynchronous call has completed before enumerating and displaying the output data.
If the asynchronous job takes longer than expected and times out you may have to run the remaining cells again to display the output. These are the cells after the Start the batch (asynchronous) API operation cell.
Your output will contain text listing the Document AI data detected in each file. The Document OCR parser does not detect form or entity data so there will be no form or entity data produced. If you can create a specialised processor then you will also see entity data printed out by the final cell.
- In the JupyterLab menu click File and then click Save Notebook to save your progress.
Congratulations
You've successfully made synchronous and asynchronous calls to the the Document AI API. In this lab, you enabled the Document AI API and created processors. You installed the client library for Python in a Vertex AI Workbench instance, parsed data from a scanned form using Python to make a synchronous API call, and parsed data from scanned forms using Python to make an asynchronous API call.
Finish your quest
This self-paced lab is part of the Detect Manufacturing Defects using Visual Inspection AI skill badge quest. A quest is a series of related labs that form a learning path. Completing this quest earns you a badge to recognize your achievement. Share your badge on your resume and social platforms, and announce your accomplishment using #GoogleCloudBadge. You can make your badge public and link to it in your online resume or social media account. Enroll in a quest and get immediate completion credit if you've taken this lab. See other available quests.
Next steps/ Learn more
- To learn more about using the Document AI APIs, read the guide.
Google Cloud training and certification
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated October 8, 2024
Lab Last Tested October 8, 2024
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.