Checkpoints
Enable the Document AI API
/ 10
Create and test a general form processor
/ 20
Authenticate API requests and Download the sample form
/ 20
Call the Document AI API using curl
/ 20
Test a Document AI form processor using the Python client libraries
/ 10
Run the Document AI Python code
/ 20
Create and Test a Document AI Processor
- GSP924
- Overview
- Setup and requirements
- Task 1. Enable the Cloud Document AI API
- Task 2. Create and test a general form processor
- Task 3. Set up the lab instance
- Task 4. Make a synchronous process document request using curl
- Task 5. Test a Document AI form processor using the Python client libraries
- Task 6. Run the Document AI Python code
- Congratulations!
GSP924
Overview
The Document AI API is a document understanding solution that takes unstructured data, such as documents and emails, and makes the data easier to understand, analyze, and consume. With the general form processor used in this lab, you can extract key/value pairs from a simple document.
In this lab you will learn how to create document parsers using Document AI, submit documents for processing via Google Cloud using the Cloud console & command line, and use Python to make synchronous API calls.
What you'll learn
You will learn how to perform the following tasks:
- Create and test Document AI processor using the console.
- Test Document AI processors using the command line.
- Test Document AI synchronous API calls using Python.
Setup and requirements
Before you click the Start Lab button
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
- Time to complete the lab---remember, once you start, you cannot pause a lab.
How to start your lab and sign in to the Google Cloud console
-
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:
- The Open Google Cloud console button
- Time remaining
- The temporary credentials that you must use for this lab
- Other information, if needed, to step through this lab
-
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
Note: If you see the Choose an account dialog, click Use Another Account. -
If necessary, copy the Username below and paste it into the Sign in dialog.
{{{user_0.username | "Username"}}} You can also find the Username in the Lab Details panel.
-
Click Next.
-
Copy the Password below and paste it into the Welcome dialog.
{{{user_0.password | "Password"}}} You can also find the Password in the Lab Details panel.
-
Click Next.
Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials. Note: Using your own Google Cloud account for this lab may incur extra charges. -
Click through the subsequent pages:
- Accept the terms and conditions.
- Do not add recovery options or two-factor authentication (because this is a temporary account).
- Do not sign up for free trials.
After a few moments, the Google Cloud console opens in this tab.
Activate Cloud Shell
Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.
- Click Activate Cloud Shell at the top of the Google Cloud console.
When you are connected, you are already authenticated, and the project is set to your Project_ID,
gcloud
is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
- (Optional) You can list the active account name with this command:
- Click Authorize.
Output:
- (Optional) You can list the project ID with this command:
Output:
gcloud
, in Google Cloud, refer to the gcloud CLI overview guide.
Task 1. Enable the Cloud Document AI API
In this task you will enable the Document AI API and create and test a general form processor. The general form processor will process any type of document and extract all the text content it can identify in the document. It is not limited to printed text, it can handle handwritten text and text in any orientation, supports a number of languages, and understands how form data elements are related to each other so that you can extract key/value pairs for form fields that have text labels.
Enable the Cloud Document AI API
Before you can begin using Document AI, you must enable the API.
-
From the Navigation menu (), click APIs & services > Library.
-
Search for Cloud Document AI API, then click the Enable button to use the API in your Google Cloud project.
If the Cloud Document AI API is already enabled you will see the Manage button and you can continue with the rest of the lab.
Click Check my progress to verify the objectives.
Task 2. Create and test a general form processor
Next you will create a Document AI processor using the Document AI Form Parser.
Create a processor
-
In the console, from the Navigation menu (), click Document AI > Overview.
-
Click Explore processors.
-
Click Create Processor for Form Parser, which is a type of general processor.
-
Specify the processor name as
form-parser
and select the region US (United States) from the list. -
Click Create to create the general
form-parser
processor.
This will create the processor and return to the processor details page that will display the processor ID, status, and the prediction endpoint.
- Make a note of the Processor ID as you will use it with
curl
to make a POST call to the API in a later task.
Download the sample form
In this task you will download the sample form from Cloud Storage. In order to upload this form in the next task, you first need to download it to your local machine.
- Download the form.pdf file to your local machine.
The file should download directly. If the file opens in your browser instead, then download the file using the file controls within your browser. The form.pdf
file is a HEALTH INTAKE FORM with sample hand-written data.
Upload a form for Document AI processing
Next you will upload the sample form you downloaded to your form-parser
processor. It will then be analyzed and the results displayed in the console.
- On the form-parser page, click the Upload Test Document button. A dialog will pop up - select the file you downloaded in the previous task for uploading.
A progress bar will indicate the level of completion of the analysis process and finally the results will be displayed. You will see that the general processor has captured the data on the form into a set of key/value pairs.
The key/value pairs parsed from the source document will be presented in the console. The left hand pane lists the data, and the right hand pane highlights with blue rectangles the source locations in the parsed document. Examine the output and compare the results with the source data.
In this task you will test a Document AI general form processor by making API calls from the command line.
Click Check my progress to verify the objectives.
Task 3. Set up the lab instance
In this section, you will set up the lab instance to use the Document AI API.
Connect to the lab VM instance using SSH
You will perform the remainder of the lab tasks in the lab VM called document-ai-dev.
-
From the Navigation menu (), click Compute Engine > VM Instances.
-
Click the SSH link for the VM Instance called document-ai-dev.
You will need the Document AI processor ID of the processor you created in Task 1 for this step. If you did not save it, then in the Cloud Console tab:
- Open the Navigation menu ().
- Click Document AI > Processors .
- Click the name of your processor to open the details page.
- From here you can copy the processor ID.
- In the SSH session, create an environment variable to contain the Document AI processor ID. You must replace the placeholder for
[your processor id]
:
- In the SSH session confirm that the environment variable contains the Document AI processor ID:
- This should print out the Processor ID similar to the following:
You will use this SSH session for the remaining tasks in this lab.
Authenticate API requests
In order to make requests to the Document AI API, you need to provide a valid credential. In this task create a service account, limit the permissions granted to that service account to those required for the lab, and then generate a credential for that account that can be used to authenticate Document AI API requests.
- Set an environment variable with your Project ID, which you will use throughout this lab:
- Create a new service account to access the Document AI API by using:
- Bind the service account to the Document AI API user role:
- Create the credentials that will be used to log in as your new service account and save them in a JSON file called
key.json
in your working directory:
- Set the
GOOGLE_APPLICATION_CREDENTIALS
environment variable, which is used by the library to find your credentials, to point to the credentials file:
- Check that the
GOOGLE_APPLICATION_CREDENTIALS
environment variable is set to the full path of the credentials JSON file you created earlier:
This environment variable is used by the gcloud command line tool to specify which credentials to use when executing commands. To read more about this form authentication, see the Application Default Credentials guide.
Download the sample form to the VM instance
Now you can download a sample form and then base64 encode it for submission to the Document AI API.
- Enter the following command in the SSH window to download the sample form to your working directory:
- Create a JSON request file for submitting the base64 encoded form for processing:
Click Check my progress to verify the objectives.
Task 4. Make a synchronous process document request using curl
In this task you process the sample document by making a call to the synchronous Document AI API endpoint using curl
.
- Submit a form for processing via
curl
. The result will be stored inoutput.json
:
- Make sure your
output.json
file contains the results of the API call:
GOOGLE_APPLICATION_CREDENTIALS
environment variable to point to the credentials JSON file you created earlier. You may need to wait a few minutes for the IAM policy to propagate, so try again if you receive an error.
The access token for the Cloud IAM service account is generated on the fly and passed to the API using the Authorization:
HTTP header. The response contains JSON formatted data that is saved to the file output.json
.
Extract the form entities
Next, explore some of the information extracted from the sample form.
- Extract the raw text detected in the document as follows:
This lists all of the text detected in the uploaded document.
- Extract the list of form fields detected by the form processor:
This lists the object data for all of the form fields detected in the document. The textAnchor.startIndex
and textAnchor.endIndex
values for each form can be used to locate the names of the detected forms in the document.text
field. The Python script that you will use in the next task will do this mapping for you.
The JSON file is quite large as it includes the base64 encoded source document as well as all of the detected text and document properties. You can explore the JSON file by opening the file in a text editor or by using a JSON query tool like jq
.
Task 5. Test a Document AI form processor using the Python client libraries
Make a synchronous call to the Document AI API using the Python Document AI client libraries.
Now you will process a document using the synchronous endpoint. For processing large amounts of documents at a time you can use the asynchronous API. To learn more about using the Document AI APIs, read the guide.
If you want to run Python scripts directly, you need to provide the appropriate credentials to those scripts, so that they can make calls to the API using a service account that has been configured with the correct permissions. To read more about how to configure this form of authentication, see the Authenticating as a service account documentation.
Configure your VM Instance to use the Document AI Python client
Now install the Python Google Cloud client libraries into the VM Instance.
- Enter the following command in the SSH terminal shell to import the lab files into your VM Instance:
- Enter the following command to install the Python client libraries required for Document AI and the other libraries required for this lab:
You should see output indicating that the libraries have been installed successfully.
Review the Document AI API Python code
Take a minute to review the Python code in the sample file. You can use an editor of your choice, such as vi
or nano
, to review the code in the SSH session or you can use the command from the previous section to copy the example code into the Cloud Shell and use the Code Editor to view the source code if you prefer.
- The first two code blocks import the required libraries and parses parameters to initialize variables that identify the Document AI processor and input data.
- The
process_document
function is used to make a synchronous call to a Document AI processor. The function creates a Document AI API client object.
The processor name required by the API call is created using the project_id
,location
, and processor_id
parameters and the document to be processed is read in and stored in a mime_type
structure.
The processor name and the document are then passed to the Document API client object and a synchronous call to the API is made. If the request is successful the document object that is returned will include properties that contain the data that has been detected by the Document AI processor.
- The script then calls the
process_document
function with the required parameters and saves the response in thedocument
variable.
- The final block of code prints the
.text
property that contains all of the text detected in the document then displays the form information using the text anchor data for each of the form fields detected by the form parser.
Click Check my progress to verify the objectives.
Task 6. Run the Document AI Python code
Execute the sample code and process the same file as before.
- Create environment variables for the Project ID and the IAM service account credentials file:
- Call the
synchronous_doc_ai.py
python program with the parameters it requires:
You will see the following block of text output:
The first block of text is a single text element containing all of the text in the document. This block of text does not include any awareness of form based data so some items, such as the Date
and Name
entries, are mixed together in this raw text value.
The code then outputs a more structured view of the data using the form data that the form-parser
has inferred from the document structure. This structured output also includes the confidence score for the form field names and values. The output from this section gives a much more useful mapping between the form field names and the values, as can be seen with the link between the Date
and Name
form fields and their correct values.
Click Check my progress to verify the objectives.
Congratulations!
You've successfully used the Document AI API to extract data from documents using a general form processor. In this lab, you created and tested a Document AI processor using the console and the command line, and made Document AI synchronous API calls using Python.
Next steps/ Learn more
- Read more in the Cloud Document AI API documentation.
Google Cloud training and certification
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated: April 17, 2024
Lab Last Tested: December 07, 2023
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.