Test and share your knowledge with our community!

done

Get access to over 700 hands-on labs, skill badges, and courses

Evaluate an ML model with BigQuery ML

Lab 1 година universal_currency_alt 5 кредитів show_chart Початковий

info This lab may incorporate AI tools to support your learning.

Overview
Setup and requirements
Task 1. Create a dataset
Task 2. Explore the data
Task 3. Create a model
Task 4. Evaluate the model
Task 5. Use the model
Review
End your lab

Test and share your knowledge with our community!

done

Get access to over 700 hands-on labs, skill badges, and courses

Overview

BigQuery ML enables users to create and execute machine learning models in BigQuery using SQL queries. The goal is to democratise machine learning by enabling SQL practitioners to build models using their existing tools and to increase development speed by eliminating the need for data movement.

The ecommerce dataset has millions of Google Analytics records for the Google Merchandise Store loaded into BigQuery, which you will use in this lab to create a model that predicts whether a visitor will make a transaction.

Objectives

In this lab, you learn how to perform the following tasks:

Create a machine learning model in BigQuery
Evaluate a machine learning model in BigQuery
Use a machine learning model in BigQuery

Setup and requirements

For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.

Sign in to Qwiklabs using an incognito window.
Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.

Open BigQuery Console

In the Google Cloud Console, on the Navigation menu , click BigQuery.
The Welcome to BigQuery in the Cloud Console dialog opens. This dialog provides a link to the quickstart guide and lists UI updates.
Click Done to close the dialog.

Task 1. Create a dataset

In this task you create a BigQuery Dataset that will be used throughout the lab.

In the left pane under the Explorer section, click on View actions icon next to your BigQuery project ID (qwiklabs-gcp-xxxx).
Select Create dataset.

The Create dataset dialog opens.

The navigation path to the Create dataset option.

Set the Dataset ID to bqml_lab, leave all other options at their default values.
Click CREATE DATASET.

Task 2. Explore the data

In this task you add a query to the query editor and examine the response. The data you use in this lab sits in the bigquery-public-data project, that is available to all.

Add the query to the query EDITOR box, and then click the RUN button:

#standardSQL SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170631' LIMIT 10000;

The data tables have a lot of columns, but very few of them are of interest to you when creating the ML model. Here the visitor's device's operating system is used, whether said device is a mobile device, the visitor's country and the number of page views will be used as the criteria for whether a transaction has been made. In this case, label is what you're trying to fit to (or predict).

This data will be the training data for the ML model you create. The training data is being limited to those collected from 1 August 2016 to 31 June 2017. This is done to save the last month of data for "prediction". It is further limited to 10,000 data points to save some time.

Click on the Save button.
In the dropdown menu, select Save View to save this query as a view.
In the Save view dialog box, specify the following, and leave the remaining settings as their defaults:

Property	Value
Dataset ID*	bqml_lab
Table name	training_data

Click Save.

Task 3. Create a model

In this task you create a model to predict whether a visitor will make a transaction.

Replace the query with the following:

#standardSQL CREATE OR REPLACE MODEL `bqml_lab.sample_model` OPTIONS(model_type='logistic_reg') AS SELECT * from `bqml_lab.training_data`;

Click Run.

In this case, bqml_lab is the name of the dataset, sample_model is the name of the model, training_data is the transactions data you looked at in the previous task. The model type specified is binary logistic regression.

Running the CREATE MODEL command creates a Query Job that will run asynchronously so you can, for example, close or refresh the BigQuery UI window.

(Optional) Model information & training statistics

If interested, you can get information about the model.

In the Explorer pane, click the arrow to the left of the BigQuery project ID (qwiklabs-gcp-xxxx).
Expand the bqml_lab dataset on the left-hand menu.
Expand the Models dataset and then click the sample_model dataset in the UI.
Click Details. You should find some basic model info and training options used to produce the model.
Click TRAINING, and then in the View as option, click the Table radio button. You should see a table similar to this:

A table with 11 rows of data below the column headings such as the completion time, iteration, and learn rate.

Task 4. Evaluate the model

In this task you evaluate the Model.

Replace the query with the following:

#standardSQL SELECT * FROM ml.EVALUATE(MODEL `bqml_lab.sample_model`);

Click Run.

In this query, you use the ml.EVALUATE function to evaluate the predicted values against the actual data, and it shares some metrics of how the model performed. You should see a table similar to this:

The Results tabbed page with one row of data populated.

Task 5. Use the model

In this task you use the model to try and predict the number of transactions made by visitors of each country.

Click Compose new query.
Copy the following query into the Editor pane:

#standardSQL SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews, fullVisitorId FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801';

Click Run.

You'll realize the SELECT and FROM portions of the query is similar to that used to generate training data. There is the additional fullVisitorId column which you will use for predicting transactions by individual user. The WHERE portion reflects the change in time frame (July 1 to August 1, 2017).

Click on the Save button.
In the dropdown menu, select Save View to save this query as a view.
In the Save view dialog box, specify the following, and leave the remaining settings as their defaults:

Property	Value
Dataset ID*	bqml_lab
Table name	july_data

Click Save.
Replace the query with the following:

#standardSQL SELECT country, SUM(predicted_label) as total_predicted_purchases FROM ml.PREDICT(MODEL `bqml_lab.sample_model`, ( SELECT * FROM `bqml_lab.july_data`)) GROUP BY country ORDER BY total_predicted_purchases DESC LIMIT 10;

With this query you will try to predict the number of transactions made by visitors of each country, sort the results, and select the top 10 countries by purchases:

Click Run.

In this query, you're using ml.PREDICT and the BQML portion of the query is wrapped with standard SQL commands. In this lab you're interested in the country and the sum of purchases for each country, so that's why SELECT, GROUP BY and ORDER BY. LIMIT is used to ensure you only get the top 10 results.

You should see a table similar to this:

A table with 10 rows of data, detailing the total predicted purchases for each of the 10 countries listed.

Replace the query with the following:

Here is another example. This time you will try to predict the number of transactions each visitor makes, sort the results, and select the top 10 visitors by transactions:

#standardSQL SELECT fullVisitorId, SUM(predicted_label) as total_predicted_purchases FROM ml.PREDICT(MODEL `bqml_lab.sample_model`, ( SELECT * FROM `bqml_lab.july_data`)) GROUP BY fullVisitorId ORDER BY total_predicted_purchases DESC LIMIT 10;

Click Run.

You should see a table similar to this:

A table with 10 rows of data, detailing the fullVisitorId and total predicted purchases.

Review

In this lab you created, evaluated, and used a machine model in BigQuery.

End your lab

When you have completed your lab, click End Lab. Qwiklabs removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

1 star = Very dissatisfied
2 stars = Dissatisfied
3 stars = Neutral
4 stars = Satisfied
5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.