
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Launch Vertex AI Workbench instance
/ 30
Create a Dockerfile and add model training code
/ 30
Build the container
/ 20
Run a hyperparameter tuning job on Vertex AI
/ 20
In this lab, you learn how to use Vertex AI to run a hyperparameter tuning job for a TensorFlow model. While this lab uses TensorFlow for the model code, you could easily replace it with another framework.
This lab uses the newest AI product offering available on Google Cloud. Vertex AI integrates the ML offerings across Google Cloud into a seamless development experience. Previously, models trained with AutoML and custom models were accessible via separate services. The new offering combines both into a single API, along with other new products. You can also migrate existing projects to Vertex AI. If you have any feedback, please see the support page.
Vertex AI includes many different products to support end-to-end ML workflows. This lab focuses on the products highlighted below: Training/HP-Tuning and Notebooks.
Vertex AI offers two Notebook Solutions, Workbench and Colab Enterprise.
Vertex AI Workbench is a good option for projects that prioritize control and customizability. It’s great for complex projects spanning multiple files, with complex dependencies. It’s also a good choice for a data scientist who is transitioning to the cloud from a workstation or laptop.
Vertex AI Workbench Instances comes with a preinstalled suite of deep learning packages, including support for the TensorFlow and PyTorch frameworks.
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Sign in to Qwiklabs using an incognito window.
Note the lab's access time (for example, 1:15:00
), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.
In the Cloud console, click Navigation menu > API & Services > Library.
Search for Compute Engine API, then click Enable if it isn't already enabled. You'll need this to create your notebook instance.
In the Cloud console, click Navigation menu > API & Services > Library.
Search for Google Container Registry API and select Enable if it isn't already. You'll use this to create a container for your custom training job.
In the Google Cloud console, from the Navigation menu (), select Vertex AI.
Click Enable All Recommended APIs.
In the Navigation menu, click Workbench.
At the top of the Workbench page, ensure you are in the Instances view.
Click Create New.
Configure the Instance:
This will take a few minutes to create the instance. A green checkmark will appear next to its name when it's ready.
Click Check my progress to verify the objective.
You'll submit this hyperparameter tuning job to Vertex by putting your training application code in a Docker container and pushing this container to Google Container Registry. Using this approach, you can tune hyperparameters for a model built with any framework.
horses_or_humans
and cd into it:The first step in containerizing your code is to create a Dockerfile. In the Dockerfile you'll include all the commands needed to run the image. It'll install all the necessary libraries, including the CloudML Hypertune library, and set up the entry point for the training code.
This Dockerfile uses the Deep Learning Container TensorFlow Enterprise 2.9 GPU Docker image. The Deep Learning Containers on Google Cloud come with many common ML and data science frameworks pre-installed.
After downloading that image, this Dockerfile sets up the entrypoint for the training code. You haven't created these files yet – in the next step, you'll add the code for training and tuning the model.
You should now have the following in your horses_or_humans/
directory:
Before you build the container, let's take a deeper look at the code. There are a few components that are specific to using the hyperparameter tuning service.
The script imports the hypertune
library. Note that the Dockerfile from Step 1 included instructions to pip install this library.
The function get_args()
defines a command-line argument for each hyperparameter you want to tune. In this example, the hyperparameters that will be tuned are the learning rate, the momentum value in the optimizer, and the number of neurons in the last hidden layer of the model, but feel free to experiment with others. The value passed in those arguments is then used to set the corresponding hyperparameter in the code.
At the end of the main()
function, the hypertune
library is used to define the metric you want to optimize. In TensorFlow, the keras model.fit
method returns a History
object. The History.history
attribute is a record of training loss values and metrics values at successive epochs. If you pass validation data to model.fit
the History.history
attribute will include validation loss and metrics values as well.
For example, if you trained a model for three epochs with validation data and provided accuracy
as a metric, the History.history
attribute would look similar to the following dictionary.
If you want the hyperparameter tuning service to discover the values that maximize the model's validation accuracy, you define the metric as the last entry (or NUM_EPOCS - 1
) of the val_accuracy
list. Then, pass this metric to an instance of HyperTune
. You can pick whatever string you like for the hyperparameter_metric_tag
, but you'll need to use the string again later when you kick off the hyperparameter tuning job.
Click Check my progress to verify the objective.
your-cloud-project
with your project ID:gcloud config list --format 'value(core.project)'
in your terminal.With the container pushed to Container Registry, you're now ready to kick off a custom model hyperparameter tuning job.
Click Check my progress to verify the objective.
This lab uses custom training via a custom container on Google Container Registry, but you can also run a hyperparameter tuning job with the Pre-built containers.
Click Train new model to enter the parameters for your hyperparameter tuning job:
horses-humans-hyptertune
(or whatever you'd like to call your model) for Model name.In the Training container step, select Custom container:
gcr.io/<your-cloud-project>/horse-human:hypertune
, with your own project name. Leave the rest of the fields blank and click Continue.Next, you'll need to add the hyperparameters that you set as command line arguments in the training application code. When adding a hyperparameter, you'll first need to provide the name. This should match the argument name that you passed to argparse
.
Enter learning_rate
for Parameter name.
Select Double as Type.
Enter 0.01
for Min, and 1
for Max.
Select Log as Scaling.
Click DONE.
After adding the learning_rate
hyperparameter, add parameters for momentum
and num_neurons
.
For momentum:
momentum
for Parameter name.0
for Min, and 1
for Max.For num_neurons:
num_neurons
for Parameter name.64,128,512
for Values.After adding the hyperparameters, you'll next provide the metric you want to optimize as well as the goal. This should be the same as the hyperparameter_metric_tag you set in your training application.
accuracy
for Metric to optimize.The Vertex AI Hyperparameter tuning service will run multiple trials of your training application with the values configured in the previous steps. You'll need to put an upper bound on the number of trials the service will run.
More trials generally leads to better results, but there will be a point of diminishing returns after which additional trials have little or no effect on the metric you're trying to optimize. It is a best practice to start with a smaller number of trials and get a sense of how impactful your chosen hyperparameters are before scaling up to a large number of trials.
You'll also need to set an upper bound on the number of parallel trials. Increasing the number of parallel trials will reduce the amount of time the hyperparameter tuning job takes to run; however, it can reduce the effectiveness of the job over all. This is because the default tuning strategy uses results of previous trials to inform the assignment of values in subsequent trials. If you run too many trials in parallel, there will be trials that start without the benefit of the result from the trials still running.
For demonstration purposes, you can set the Maximum number of trials to be 15
and the maximum number of parallel trials to be 3
. You can experiment with different numbers, but this can result in a longer tuning time and higher cost.
Select Default as the Algorithm, which will use Google Vizier to perform Bayesian optimization for hyperparameter tuning. You can learn more about this algorithm from the blog Hyperparameter tuning in Cloud Machine Learning Engine using Bayesian Optimization.
Click Continue.
In Compute and pricing, leave the selected region as-is and for Compute settings select Deploy to new worker pool.
Configure Worker pool 0 as follows:
100
.When it's finished, you'll be able to click on the job name and see the results of the tuning trials.
Click Check my progress to verify the objective.
If you'd like to continue using the notebook you created in this lab, it is recommended that you turn it off when not in use. From the Notebooks UI in your Cloud console, select the notebook and then select Stop.
If you'd like to delete the notebook entirely, simply click the Delete button in the top right.
To delete the Storage Bucket, go to Navigation menu > Cloud Storage, select your bucket, and click Delete.
You've learned how to use Vertex AI to:
Launch a hyperparameter tuning job for training code provided in a custom container. You used a TensorFlow model in this example, but you can train a model built with any framework using custom or built-in containers. To learn more about different parts of Vertex, check out the Vertex AI documentation.
When you have completed your lab, click End Lab. Qwiklabs removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one