data:image/s3,"s3://crabby-images/87de2/87de2d4aac46dee53b9915314401a779e742dd5c" alt=""
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Enable the Document AI API
/ 10
Create and test a general form processor
/ 20
Authenticate API requests and Download the sample form
/ 20
Call the Document AI API using curl
/ 20
Test a Document AI form processor using the Python client libraries
/ 10
Run the Document AI Python code
/ 20
The Document AI API is a document understanding solution that takes unstructured data, such as documents and emails, and makes the data easier to understand, analyze, and consume. With the general form processor used in this lab, you can extract key/value pairs from a simple document.
In this lab you will learn how to create document parsers using Document AI, submit documents for processing via Google Cloud using the Cloud console & command line, and use Python to make synchronous API calls.
You will learn how to perform the following tasks:
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.
This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the left is the Lab Details pane with the following:
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
If necessary, copy the Username below and paste it into the Sign in dialog.
You can also find the Username in the Lab Details pane.
Click Next.
Copy the Password below and paste it into the Welcome dialog.
You can also find the Password in the Lab Details pane.
Click Next.
Click through the subsequent pages:
After a few moments, the Google Cloud console opens in this tab.
Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.
Click Activate Cloud Shell at the top of the Google Cloud console.
Click through the following windows:
When you are connected, you are already authenticated, and the project is set to your Project_ID,
gcloud
is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
Output:
Output:
gcloud
, in Google Cloud, refer to the gcloud CLI overview guide.
In this task you will enable the Document AI API and create and test a general form processor. The general form processor will process any type of document and extract all the text content it can identify in the document. It is not limited to printed text, it can handle handwritten text and text in any orientation, supports a number of languages, and understands how form data elements are related to each other so that you can extract key/value pairs for form fields that have text labels.
Before you can begin using Document AI, you must enable the API.
From the Navigation menu (), click APIs & services > Library.
Search for Cloud Document AI API, then click the Enable button to use the API in your Google Cloud project.
If the Cloud Document AI API is already enabled you will see the Manage button and you can continue with the rest of the lab.
Click Check my progress to verify the objectives.
Next you will create a Document AI processor using the Document AI Form Parser.
In the console, from the Navigation menu (), click Document AI > Overview.
Click Explore processors.
Click Create Processor for Form Parser, which is a type of general processor.
Specify the processor name as form-parser
and select the region US (United States) from the list.
Click Create to create the general form-parser
processor.
This will create the processor and return to the processor details page that will display the processor ID, status, and the prediction endpoint.
curl
to make a POST call to the API in a later task.In this task you will download the sample form from Cloud Storage. In order to upload this form in the next task, you first need to download it to your local machine.
The file should download directly. If the file opens in your browser instead, then download the file using the file controls within your browser. The form.pdf
file is a HEALTH INTAKE FORM with sample hand-written data.
Next you will upload the sample form you downloaded to your form-parser
processor. It will then be analyzed and the results displayed in the console.
A progress bar will indicate the level of completion of the analysis process and finally the results will be displayed. You will see that the general processor has captured the data on the form into a set of key/value pairs.
The key/value pairs parsed from the source document will be presented in the console. The left hand pane lists the data, and the right hand pane highlights with blue rectangles the source locations in the parsed document. Examine the output and compare the results with the source data.
In this task you will test a Document AI general form processor by making API calls from the command line.
Click Check my progress to verify the objectives.
In this section, you will set up the lab instance to use the Document AI API.
You will perform the remainder of the lab tasks in the lab VM called document-ai-dev.
From the Navigation menu (), click Compute Engine > VM Instances.
Click the SSH link for the VM Instance called document-ai-dev.
You will need the Document AI processor ID of the processor you created in Task 1 for this step. If you did not save it, then in the Cloud Console tab:
[your processor id]
:You will use this SSH session for the remaining tasks in this lab.
In order to make requests to the Document AI API, you need to provide a valid credential. In this task create a service account, limit the permissions granted to that service account to those required for the lab, and then generate a credential for that account that can be used to authenticate Document AI API requests.
key.json
in your working directory:GOOGLE_APPLICATION_CREDENTIALS
environment variable, which is used by the library to find your credentials, to point to the credentials file:GOOGLE_APPLICATION_CREDENTIALS
environment variable is set to the full path of the credentials JSON file you created earlier:This environment variable is used by the gcloud command line tool to specify which credentials to use when executing commands. To read more about this form authentication, see the Application Default Credentials guide.
Now you can download a sample form and then base64 encode it for submission to the Document AI API.
Click Check my progress to verify the objectives.
In this task you process the sample document by making a call to the synchronous Document AI API endpoint using curl
.
curl
. The result will be stored in output.json
:output.json
file contains the results of the API call:GOOGLE_APPLICATION_CREDENTIALS
environment variable to point to the credentials JSON file you created earlier. You may need to wait a few minutes for the IAM policy to propagate, so try again if you receive an error.
The access token for the Cloud IAM service account is generated on the fly and passed to the API using the Authorization:
HTTP header. The response contains JSON formatted data that is saved to the file output.json
.
Next, explore some of the information extracted from the sample form.
This lists all of the text detected in the uploaded document.
This lists the object data for all of the form fields detected in the document. The textAnchor.startIndex
and textAnchor.endIndex
values for each form can be used to locate the names of the detected forms in the document.text
field. The Python script that you will use in the next task will do this mapping for you.
The JSON file is quite large as it includes the base64 encoded source document as well as all of the detected text and document properties. You can explore the JSON file by opening the file in a text editor or by using a JSON query tool like jq
.
Make a synchronous call to the Document AI API using the Python Document AI client libraries.
Now you will process a document using the synchronous endpoint. For processing large amounts of documents at a time you can use the asynchronous API. To learn more about using the Document AI APIs, read the guide.
If you want to run Python scripts directly, you need to provide the appropriate credentials to those scripts, so that they can make calls to the API using a service account that has been configured with the correct permissions. To read more about how to configure this form of authentication, see the Authenticating as a service account documentation.
Now install the Python Google Cloud client libraries into the VM Instance.
You should see output indicating that the libraries have been installed successfully.
Take a minute to review the Python code in the sample file. You can use an editor of your choice, such as vi
or nano
, to review the code in the SSH session or you can use the command from the previous section to copy the example code into the Cloud Shell and use the Code Editor to view the source code if you prefer.
process_document
function is used to make a synchronous call to a Document AI processor. The function creates a Document AI API client object.The processor name required by the API call is created using the project_id
,location
, and processor_id
parameters and the document to be processed is read in and stored in a mime_type
structure.
The processor name and the document are then passed to the Document API client object and a synchronous call to the API is made. If the request is successful the document object that is returned will include properties that contain the data that has been detected by the Document AI processor.
process_document
function with the required parameters and saves the response in the document
variable..text
property that contains all of the text detected in the document then displays the form information using the text anchor data for each of the form fields detected by the form parser.Click Check my progress to verify the objectives.
Execute the sample code and process the same file as before.
synchronous_doc_ai.py
python program with the parameters it requires:You will see the following block of text output:
The first block of text is a single text element containing all of the text in the document. This block of text does not include any awareness of form based data so some items, such as the Date
and Name
entries, are mixed together in this raw text value.
The code then outputs a more structured view of the data using the form data that the form-parser
has inferred from the document structure. This structured output also includes the confidence score for the form field names and values. The output from this section gives a much more useful mapping between the form field names and the values, as can be seen with the link between the Date
and Name
form fields and their correct values.
Click Check my progress to verify the objectives.
You've successfully used the Document AI API to extract data from documents using a general form processor. In this lab, you created and tested a Document AI processor using the console and the command line, and made Document AI synchronous API calls using Python.
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated: April 17, 2024
Lab Last Tested: December 07, 2023
Copyright 2025 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one