
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Enable Document AI API
/ 20
Create a form processor
/ 20
Create Google Cloud resources
/ 20
Deploy Cloud Run functions
/ 20
Validate data processed by the pipeline
/ 20
In a challenge lab you’re given a scenario and a set of tasks. Instead of following step-by-step instructions, you will use the skills learned from the labs in the course to figure out how to complete the tasks on your own! An automated scoring system (shown on this page) will provide feedback on whether you have completed your tasks correctly.
When you take a challenge lab, you will not be taught new Google Cloud concepts. You are expected to extend your learned skills, like changing default values and reading and researching error messages to fix your own mistakes.
To score 100% you must successfully complete all tasks within the time period!
This lab is recommended for students enrolled in the Automate Data Capture at Scale with Document AI skill badge course. Are you ready for the challenge?
You are a data engineer at large infrastructure management company and have been assigned to work on a internal project with the financial division of the company. The company has to process an ever increasing mountain of documents that all require individual manual processing for validation and authorization, which is an expensive task that requires a lot of staff. The company plans to leverage Google Cloud tools to automate the process of collecting, categorizing, and verifying documents in an efficient and less labor intensive manner.
You must create a document processing pipeline that will automatically process documents that are uploaded to Cloud Storage. The pipeline consists of a primary Cloud Run functions that processes new files using a Document AI form processor to extract the data from the document. The function then saves the form data detected in those files to BigQuery.
You are provided with the source code for a Cloud Run functions that will perform the processing, and you are expected to deploy the document processing pipeline as shown in the architecture below, making sure to correctly configure the components for your specific pipeline.
In this task, you enable the Cloud Document AI API and copy your starter files into Cloud Shell.
The Cloud Run functions with predefined code is hosted on a remote Cloud Storage bucket. Copy these source files into your Cloud Shell. These files include the source code for the Cloud Run functions and the schema for the BigQuery table that you will create in the lab.
Create an instance of the general form processor using the Document AI Form Parser processor in the General (non-specialized) section. The general form processor will process any type of document and extract all the text content it can identify in the document as well as form information that it infers from the layout.
Property | Value |
---|---|
Processor Type | Form Parser |
Processor Name | |
Region | US |
Prepare your environment by creating the Google Cloud Storage and BigQuery resources that are required for your document processing pipeline.
Bucket Name | Purpose | Storage class | Location |
---|---|---|---|
For input invoices | Standard | ||
For storing processed data | Standard | ||
For archiving invoices | Standard |
Dataset Name | Location |
---|---|
invoice_parser_results | US |
The table schema for the extracted information has been provided for you in the JSON file document-ai-challenge/scripts/table-schema/doc_ai_extracted_entities.json
. Use this schema to create a table named doc_ai_extracted_entities in the invoice_parser_results dataset.
You can navigate to BigQuery in the Cloud Console and inspect the schema of tables in the invoice_parser_results dataset using BigQuery SQL workspace.
To complete this task, you must deploy the Cloud Run functions that your data processing pipeline uses to process invoices uploaded to Cloud Storage. This function will use a Document AI API Generic Form processor to extract form data from the raw documents.
You can examine the source code of the Cloud Run functions using the Code Editor or any other editor of your choice. The Cloud Run functions is stored in the following folders in Cloud Shell:
scripts/cloud-functions/process-invoices
The Cloud Run functions, process-invoices
, must be triggered when files are uploaded to the input files storage bucket you created earlier.
Deploy a Cloud Run functions that uses a Document AI form processor to parse form documents that have been uploaded to a Cloud Storage bucket.
scripts
directory:If you inspect the Cloud Run Functions source code you will see that the function gets the Document AI processor details via two runtime environment variables
.
PROCESSOR_ID
and PARSER_LOCATION
contain the correct values for the Form Parser processor you deployed in a previous step.must be in lower case
.PROJECT_ID
environment variable with your project ID.Wait for the function to be fully redeployed.
For your final task you must successfully process the set of invoices that are available in the ~/document-ai-challenge/invoices
folder using your pipeline.
Upload these invoices to the input Cloud Storage bucket and monitor the progress of the pipeline.
Watch the events until you see a final event indicating that the function execution finished with a status of OK.
Once the pipeline has fully processed the documents, you will see that the form information that is extracted from the invoices by the Document AI processor has been written out into the BigQuery table.
.env.yaml
file in the previous section are correct and try again.
In particular make sure the Processor ID and location variables that you set are valid and note that the location parameter must be in lower case.
Also note that the event list does not automatically refresh.
Congratulations! In this lab, you have successfully created a document processing pipeline that automatically processes documents uploaded to Cloud Storage using the Document AI API. You have created a form processor, deployed a Cloud Run functions to process documents, and validated the end-to-end solution by processing a set of invoices.
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated November 4, 2024
Lab Last Tested November 4, 2024
Copyright 2025 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one