Text Classification using AutoML
Overview
In this lab, you use AutoML with Vertex AI to train a text dataset to predict the source of an article.
Objectives
You learn how to:
- Import a text dataset to AutoML.
- Train the ML model for text classification.
- Evaluate the model performance.
- Deploy the model to an endpoint.
- Get predictions.
Setup
Before you click the Start Lab button
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
What you need
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
- Time to complete the lab.
Log in to Google Cloud Console
- Using the browser tab or window you are using for this lab session, copy the Username from the Connection Details panel and click the Open Google Console button.
- Paste in the Username, and then the Password as prompted.
- Click Next.
- Accept the terms and conditions.
Since this is a temporary account, which will last only as long as this lab:
- Do not add recovery options
- Do not sign up for free trials
- Once the console opens, view the list of services by clicking the Navigation menu () at the top-left.
Introduction to Vertex AI
This lab uses Vertex AI, the unified AI platform on Google Cloud, to train and deploy an ML model. Vertex AI offers two options on one platform to build an ML model: a no-code solution with AutoML, and a code-based solution with Custom Training that uses Vertex Workbench. You use AutoML in this lab.
In this lab, you train the model to learn the relationship between article titles and their sources (including GitHub, the New York Times, and TechCrunch). You then use the trained model to predict the source of given article titles.
Task 1. Prepare the training data
The initial Vertex AI dashboard illustrates the major stages to train and deploy an ML model: prepare the data, train the model, and get predictions. Later, the dashboard displays your recent activities, such as the recent datasets, models, predictions, endpoints, and notebook instances.
Create a dataset
- In the Google Cloud console, in the Navigation menu, click Vertex AI > Datasets.
- Click Create dataset.
- On the Datasets page, give the dataset a name.
- For Data type and objective, click Text, and then select Multi-label classification.
- For Region select
region. - Click Create.
Upload data
There are three options for importing text data in Vertex AI:
- Upload text documents from your computer.
- Upload import files from your computer.
- Select import files from Cloud Storage.
For convenience, the dataset is already uploaded to Cloud Storage.
-
For the data source, select Select import files from Cloud Storage.
-
For Import file path, enter:
- Click Continue.
(Optional) Analyze import data
-
To see the import data, click Browse.
You can find the overall information of the dataset, including the total number of data points and the number for each label/category. You can also use the filter to browse the imported data. -
To see a brief analysis of the dataset, click Analyze.
Task 2. Train the model
With a dataset uploaded, you're ready to train the model.
-
Click Train new model.
-
For Training method, select AutoML.
-
Click Continue.
-
For Model details, select Train new model.
-
Give the model a name and, optionally, a description.
-
(Optional) Explore Advanced options to determine how to assign the training versus testing data and specify the encryption.
-
Click Start training.
Depending on the data size and the training objectives, the training can take from a few minutes to a couple of hours. Normally you would receive an email from Google Cloud when the training job is complete. However, in the Qwiklabs environment, you will not receive an email.
Task 3. Evaluate the model performance (demonstration only)
Vertex AI provides metrics to evaluate the model performance. For text classification, you focus on Precision/Recall curve.
- Navigate to the Models tab.
- Click the model you just trained.
- Browse the Evaluate tab.
The Precision/Recall curve
The confidence threshold determines how an ML model counts the positive cases. A higher threshold increases precision, but decreases recall. A lower threshold decreases precision, but increases recall. You can manually adjust the threshold to observe its impact on precision and recall and find the best tradeoff point between the two to meet your business needs.
Task 4. Deploy the model (demonstration only)
Now that you have a trained model, the next step is to create an endpoint in Vertex AI. A model resource in Vertex can have multiple endpoints associated with it, and you can split traffic between endpoints.
Create and define an endpoint
-
On your model page, on the Deploy and test tab, click Deploy to endpoint.
-
For Endpoint name, enter a name.
-
Click Continue.
Your endpoint will take a few minutes to deploy. When it's completed, a green check mark will appear next to the name.
Now you're ready to get predictions on your deployed model.
Task 5. Configure the environment
-
Click Activate Cloud Shell . If prompted click continue.
-
To create an Endpoint environment variable, run the following command:
- Download the test files from Cloud Storage:
The example files CLOUD1-JSON
and CLOUD2-JSON
have content similar to:
Task 6. Get predictions
The system now has the test files available, so AutoML can be used to request predictions. The test files include the following two examples:
File | Text Message |
---|---|
CLOUD1-JSON | Google's plan for the future of work. |
CLOUD2-JSON | Markdown Cheatsheet |
Example One
- Set
CLOUD1-JSON
as the input file:
Example Text
- Request a prediction:
Example output:
Example Two
- Set
CLOUD2-JSON
as the input file:
Example Text
- Request a prediction:
Example output:
You can now use Vertex AI to:
- Upload a text dataset.
- Train a text classification model with AutoML.
- Evaluate the model performance.
- Deploy the trained AutoML model to an endpoint.
- Get predictions.
🎉 Congratulations! 🎉
To learn more about different parts of Vertex AI, see the Vertex AI documentation.
End your lab
When you have completed your lab, click End Lab. Qwiklabs removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
- 1 star = Very dissatisfied
- 2 stars = Dissatisfied
- 3 stars = Neutral
- 4 stars = Satisfied
- 5 stars = Very satisfied
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Manual Last Updated May 20, 2024
Lab Last Tested May 20, 2024
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.