
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Create machine learning models
/ 15
Use the model
/ 15
Save and Load Models
/ 15
Explore Callbacks
/ 15
Exercise 1
/ 15
Exercise 2
/ 15
Exercise 3
/ 10
TensorFlow is an open source, powerful, portable machine learning (ML) library developed by Google that can work with very large datasets. In this lab, you will create and train a computer vision model to recognize different items of clothing using TensorFlow Vertex AI Workbench.
TensorFlow provides a computational framework for building ML models. TensorFlow provides a variety of different toolkits that allow you to construct models at your preferred level of abstraction. In this lab you'll use tf.keras, a high-level API to build and train a neural network for classifying images in TensorFlow.
A neural network is a model inspired from the brain. It is composed of layers, at least one of which is hidden, that consists of simple connected units or neurons followed by nonlinearities.
A node in a neural network typically takes multiple input values and generates one output value. The neuron applies an activation function (nonlinear transformation) to a weighted sum of input values to calculate the output value.
For more information about neural networks, see Neural Networks: Structure.
In this lab, you will learn how to:
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.
This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
In the Google Cloud console, on the Navigation menu (), click Vertex AI > Workbench.
Find the
The JupyterLab interface for your Workbench instance opens in a new browser tab.
From the Launcher menu, under Other, select Terminal.
Check if your Python environment is already configured. Copy and paste the following command in the terminal.
Example output:
pip3
, run the following command in the terminal.Pylint is a tool that checks for errors in Python code, and highlights syntactical and stylistic problems in your Python source code.
pylint
package.requirements.txt
file:Now, your environment is set up!
Click the + icon on the left side of the Workbench to open a new Launcher.
From the Launcher menu, under Notebook, select Python3.
You will be presented with a new Jupyter notebook. For more information on how to use Jupyter notebooks, see the Jupyter Notebook documentation.
logging
and google-cloud-logging
for Cloud Logging. In the first cell, add the following code:tensorflow
for training and evaluating the model. Call it tf
for ease of use. Add the following code to the first cell.numpy
, to parse through the data for debugging purposes. Call it np
for ease of use. Add the following code to the first cell.tensorflow_datasets
to integrate the dataset. TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow.To run the cell, either click the Run button or press Shift + Enter.
Save the notebook. Click File -> Save. Name the file model.ipynb
and click OK.
You will train a neural network to classify images of clothing from a dataset called Fashion MNIST.
This dataset contains 70,000 items of clothing belonging to 10 different categories of clothing. The images show individual articles of clothing at low resolution (28 by 28 pixels), as seen here:
In this lab, 60,000 images will be used to train the network and 10,000 images will be used to evaluate how accurately the network learned to classify images.
The Fashion MNIST data is available in tensorflow datasets(tfds).
To load the Fashion MNIST data, you will use the tfds.load()
function.
In the above code, you set the split
argument to specify which splits of the dataset is to be loaded. You set as_supervised
to True
to ensure that the loaded tf.data.Dataset
will have a 2-tuple structure (input, label)
.
ds_train and ds_test are of type tf.data.Dataset
. ds_train has 60,000 images which will be used for training the model. ds_test has 10,000 images which will be used for evaluating the model.
tfds.load()
and its arguments check the guide.What do these values look like?
32
.Specify the batch size by adding the following to model.ipynb
:
The code given below uses the map()
function of tf.data.Dataset
to apply the normalization to images in ds_train
and ds_test
. Since the pixel values are of type tf.uint8
, the tf.cast
function is used to convert them to tf.float32
and then divide by 255.0
. The dataset is also converted into batches by calling the batch()
method with BATCH_SIZE
as the argument.
tf.data.Dataset
here.Add the following code to the end of the file:
Add the following code to the end of the file:
In this section, you will design your model using TensorFlow.
Look at the different types of layers and the parameters used in the model architecture:
Sequential: This defines a SEQUENCE of layers in the neural network.
Flatten: Our images are of shape (28, 28), i.e, the values are in the form of a square matrix. Flatten takes that square and turns it into a one-dimensional vector.
Dense: Adds a layer of neurons.
Each layer of neurons needs an activation function to decide if a neuron should be activated or not. There are lots of options, but this lab uses the following ones.
Relu
effectively means if X>0 return X, else return 0. It passes values 0 or greater to the next layer in the network.Softmax
takes a set of values, and effectively picks the biggest one so you don't have to sort to find the largest value. For example, if the output of the last layer looks like [0.1, 0.1, 0.05, 0.1, 9.5, 0.1, 0.05, 0.05, 0.05], it returns [0,0,0,0,1,0,0,0,0].In this section you will first compile your model with an optimizer and loss function. You will then train the model on your training data and labels.
The goal is for the model to figure out the relationship between the training data and its labels. Once training is complete, you want your model to see fresh images of clothing that resembles your training data and make predictions about what class of clothing they belong to.
An optimizer is one of the two arguments required for compiling a tf.keras model. An Optimizer
is an algorithm that modifies the attributes of the neural network like weights and learning rate. This helps in reducing the loss and improving accuracy.
tf.keras
, here.Loss
indicates the model's performance by a number. If the model is performing better, loss will be a smaller number. Otherwise loss will be a larger number.
tf.keras
here.Notice the metrics=
parameter. This allows TensorFlow to report on the accuracy of the training after each epoch by checking the predicted results against the known answers(labels). It basically reports back on how effectively the training is progressing.
Model.fit will train the model for a fixed number of epochs.
Click Check my progress to verify the objective.
When the notebook cell executes, you will see both loss and accuracy reported after each epoch (or pass) of training. Notice that with each epoch (or pass), the accuracy goes up:
Example output (your values may be slightly different, ignore any warning messages):
For # Values before normalization
output, you'll notice that the min and max are in the range of [0, 255]
. After normalization you can see that all the values are in the range of [0, 1]
.
As the training progresses, the loss decreases and accuracy increases.
When the model is done training, you will see an accuracy value at the end of the final epoch. It might be close to 0.8864 as above (your accuracy may be different).
This tells you that the neural network is about 89% accurate in classifying the training data. In other words, it figured out a pattern match between the image and the labels that worked 89% of the time. Not great, but not bad considering it was only trained for five epochs on a small neural network.
But how would the model perform on data it hasn't seen?
The test set can help answer this question. You call model.evaluate
, pass in the two sets, and it reports back the loss for each.
Evaluate the test set:
If you scroll to the bottom of your output, you can see the result of evaluation in the last line.
The model reports an accuracy of .8708 on the test set(ds_test
), meaning it was about 87% accurate. (Your values may be slightly different).
As expected, the model is not as accurate on the unknown data as it was with the data it was trained on!
As you dive deeper into TensorFlow, you'll learn about ways to improve this.
Click Check my progress to verify the objective.
Model progress can be saved during and after training. This means a model can resume where it left off and avoid long training times. Saving also means you can share your model and others can recreate your work. For this first exercise, you will add necessary code to save and load your model.
SavedModel
and Keras
). The TensorFlow SavedModel
format is the default file format in TF2.x. However, models can be saved in Keras
format. You will learn more on saving models in the two file formats.The above code shows how you can save the model in two different formats and load the saved model back. You can choose any format according to your use case. You can read more about this functionality in the TensorFlow Documentation for "Save and load models".
At the end of the output you will see two sets of model summaries. The first one shows the summary after the model is saved in the SavedModel
format. The second one shows the summary after the model is saved in the h5
format.
You can see that both model summaries are identical since we are effectively saving the same model in two different formats.
Click Check my progress to verify the objective.
Earlier when you trained the model, you would have noticed that, as the training progressed, the model's loss decreased and its accuracy increased. Once you have achieved the desired training accuracy and loss, you may still have to wait for a bit of time for the training to complete.
You might have thought, "Wouldn't it be nice if I could stop training when the model reached a desired value of accuracy?"
For example, if 95% accuracy is good enough, and the model managed to achieve that after 3 epochs of training, why sit around waiting for many more epochs to complete?
Answer: Callbacks!
A callback is a powerful tool to customize the behavior of a Keras model during training, evaluation, or inference. You can define a callback to stop training as soon as your model reaches a desired accuracy on the training set.
Try the following code to see what happens when you set a callback to stop training when accuracy reaches 84%:
Open the Launcher and select Python3 to create a new Jupiter notebook.
Save the file as callback_model.ipynb
.
Paste the following code into the first cell of callback_model.ipynb
:
Press Ctrl + S or go to File > Save Notebook, to save the changes.
Run the code by clicking the Run button or pressing Shift + Enter.
See that the training was canceled after a few epochs.
Click Check my progress to verify the objective.
In this section you will experiment with the different layers of the network.
In this Exercise you will explore the layers in your model. What happens when you change the number of neurons?
Open the Launcher and select Python3 to create a new Jupiter notebook.
Save the file as updated_model.ipynb
.
Paste the following code into the first cell of updated_model.ipynb
:
Go to # Define the model
section, change 64 to 128 neurons:
Press Ctrl + S or go to File > Save Notebook, to save the changes.
Run the code by clicking the Run button or pressing Shift + Enter.
What different results do you get for loss, training time, etc.? Why do you think that's the case?
When you increase to 128 neurons, you have to do more calculations. This slows down the training process. In this case, the increase had a positive impact because the model is more accurate. But, it's not always a case of 'more is better'. You can hit the law of diminishing returns very quickly.
Click Check my progress to verify the objective.
Consider the effects of additional layers in the network. What will happen if you add another layer between the two dense layers?
updated_model.ipynb
, add a layer in the # Define the model
section.Replace your model definition with the following:
Press Ctrl + S or go to File > Save Notebook, to save the changes.
Run the code by clicking the Run button or pressing Shift + Enter.
Answer: No significant impact -- because this is relatively simple data. For far more complex data, extra layers are often necessary.
Click Check my progress to verify the objective.
Before you trained your model, you normalized the pixel values to the range of [0, 1]
. What would be the impact of removing normalization so that the values are in the range of [0, 255], like they were originally in the dataset?
# Define, load and configure data
, remove the map function applied to both the training and test datasets.[0, 255]
.updated_model.ipynb
will look like this:Press Ctrl + S or go to File > Save Notebook, to save the changes.
Run the code by clicking the Run button or pressing Shift + Enter.
Expected output for # Print out max value to see the changes
After completing the epochs you can see the difference in accuracy without normalization.
Why do you think the accuracy changes?
Click Check my progress to verify the objective.
What happens if you remove the Flatten()
layer, and why?
Go ahead, try it:
# Define the model section
, remove tf.keras.layers.Flatten()
:updated_model.ipynb
.You get an error about the shape of the data. This is expected.
The details of the error may seem vague right now, but it reinforces the rule of thumb that the first layer in your network should be the same shape as your data. Right now, the input images are of shape 28x28, and 28 layers of 28 neurons would be infeasible. So, it makes more sense to flatten that 28,28 into a 784x1.
Instead of writing all the code to handle that yourselves, you can add the Flatten()
layer at the beginning. When the arrays are loaded into the model later, they'll automatically be flattened for you.
Notice the final (output) layer. Why are there 10 neurons in the final layer? What happens if you have a different number than 10?
Find out by training the network with 5.
# Define the model
section with the following to undo the change you made in the previous section:updated_model.ipynb
.What happens: You get an error as soon as it finds an unexpected value.
Another rule of thumb -- the number of neurons in the last layer should match the number of classes you are classifying for. In this case, it's the digits 0-9, so there are 10 of them, and hence you should have 10 neurons in your final layer.
Congratulations! In this lab, you learned how to design, compile, train, and evaluate a Tensorflow model. You also learned how to save and load models and write your own callbacks to customize behavior during training. You also completed a series of exercises to guide you through experimenting with the different layers of the network.
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated September 12, 2024
Lab Last Tested September 12, 2024
Copyright 2025 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one