检查点
Explore thelook_ecommerce public dataset
/ 50
Explore NCAA Basketball public dataset
/ 50
Navigate BigQuery
Make sure to complete this hands-on lab on a desktop/laptop only.
There are only 5 attempts permitted per lab.
As a reminder – it is common to not get every question correct on your first try, and even to need to redo a task; this is part of the learning process.
Once a lab is started, the timer cannot be paused. After 1 hour and 30 minutes, the lab will end and you’ll need to start again.
For more information review the Lab technical tips reading.
Activity overview
Cloud data analytics uses a variety of tools that can assist with each phase of the analysis process. Two popular and powerful tools that work across many major cloud platforms are BigQuery and Looker.
BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data using the Google Cloud Console interface. With BigQuery, you can use SQL queries to retrieve, clean and organize data, ensuring you get the quality data you need for reporting and analysis. You can also use BigQuery to write SQL queries to combine data from multiple tables using JOINs.
Looker is a business intelligence (BI) platform that helps you explore, analyze, visualize, and share your data. Part of the Looker platform and easily accessed in the BigQuery UI, Looker Studio is a tool that turns your data into informative and fully customizable dashboards and reports.
In this lab, you’ll explore two datasets in BigQuery, and run SQL queries to filter the data. Then, you'll review the visualized results using Looker Studio.
Scenario
Congratulations! You have been hired as a data analyst at TheLook eCommerce, a global company that sells clothing products through physical stores and through digital channels including their own website, their own mobile app, and various third-party social media apps. TheLook eCommerce has been growing quickly thanks to the company’s wide variety of clothing styles, focus on innovation, and commitment to ethical and sustainable sourcing.
TheLook eCommerce is planning to run an ad campaign showcasing the highest scoring college basketball players from National Collegiate Athletic Association (NCAA) modeling the company’s apparel. Martina, the marketing manager, wants the first phase of the campaign to promote swimwear products.
To identify the swimwear products with the highest sales in June, historically the month with the most swimwear sales for the company, Martina asks you to produce a report with the sales data for the swimwear category for June 2023. In order to determine which athletes will be featured for the campaign, you'll explore the NCAA’s public dataset to produce a report with the highest-scoring basketball players.
Here’s how you'll do this task: First, you’ll explore the tables in the thelook_gcda dataset. Next, you’ll filter the data to retrieve the information on swim products sold in the last 30 days. Third, you’ll explore the tables in the ncaa_basketball public dataset. Finally, you’ll filter the data to retrieve the information on the 10 highest scoring basketball players.
Setup
Before you click Start Lab
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This practical lab lets you do the activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
- Access to a standard internet browser (Chrome browser recommended).
- Time to complete the lab---remember, once you start, you cannot pause a lab.
How to start your lab and sign in to the Google Cloud console
-
Click the Start Lab button. On the left is the Lab Details panel with the following:
- Time remaining
- The Open Google Cloud console button
- The temporary credentials that you must use for this lab
- Other information, if needed, to step through this lab
Note: If you need to pay for the lab, a pop-up opens for you to select your payment method. -
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window) if you are running the Chrome browser. The Sign in page opens in a new browser tab.
Tip: You can arrange the tabs in separate, side-by-side windows to easily switch between them.
Note: If the Choose an account dialog displays, click Use Another Account. -
If necessary, copy the Google Cloud username below and paste it into the Sign in dialog. Click Next.
You can also find the Google Cloud username in the Lab Details panel.
- Copy the Google Cloud password below and paste it into the Welcome dialog. Click Next.
You can also find the Google Cloud password in the Lab Details panel.
- Click through the subsequent pages:
- Accept the terms and conditions
- Do not add recovery options or two-factor authentication (because this is a temporary account)
- Do not sign up for free trials
After a few moments, the Console opens in this tab.
Task 1. Explore thelook_gcda dataset
In this task, you'll explore the thelook_gcda dataset and the tables it contains. You'll then run a query that joins two tables and retrieves data on the swim products sold in June 2023.
- In the Google Cloud console Navigation menu (), click BigQuery > BigQuery Studio.
- On the Explorer search field, type thelook, and press ENTER.
- Select the thelook_gcda dataset.
- On the Explorer pane, expand the thelook_gcda dataset. The tables within this dataset are displayed. You may have to scroll down to explore the complete list of tables.
For this part of the task, you'll examine the swim products sold in June 2023.
- Select the products table. The table schema displays.
- Select the Preview tab to examine the data. Note that each product contains a unique identifier in the id column.
Now, examine both the order_items and product tables. To determine how many swimwear products were sold in June 2023, these two tables will need to be joined on a common column.
- In the Query Editor, click on the Compose new query (+) icon to open a new Untitled tab to run the query in.
- Copy and paste the following command into the Untitled tab:
- Click Run.
This query will join the order_items and product tables and return all swim-related orders that are not returned or canceled in June 2023.
Finally, explore the results using Looker Studio.
- Click Explore Data in the Query results pane, and select Explore with Looker Studio.
Click Check my progress to verify that you have completed this task correctly.
Task 2. Explore the NCAA basketball public dataset
In this task, you'll explore the ncaa_basketball public dataset and the tables it contains. You'll then run a query to retrieve the data on the highest scoring NCAA basketball players. Finally, you'll run a query that ranks the 10 highest scoring players for a single game.
- If you aren't already in BigQuery Studio, in the Google Cloud console Navigation menu (), click BigQuery > BigQuery Studio.
- On the Explorer bar, click + Add to add a data source to the project.
- On the Add page, in the Additional sources list, select Public Datasets. The Marketplace page opens.
- In the search field, type ncaa, and press ENTER.
- Select the NCAA Basketball dataset.
- On the Products details page, click View Dataset to view the detailed data schema of this BigQuery public dataset.
- On the Explorer pane, expand the ncaa_basketball dataset. The tables within this dataset are displayed.
First, you'll need to add up all the points a player accumulated across all games.
- Preview each of the tables within the ncaa_basketball dataset.
- Select the mbb_players_games_sr table. Which columns do you think will be helpful to find the players with the highest numbers of points?
Each row in the mbb_players_games_sr has the results for each player, and for each game played. To get the total number of points per player per game, you'll need to run a query that summarizes the data across games.
- Copy the following query into the Query Editor:
This query will return one row for each of the players, their respective team, and the sum of points across all games they played.
- Click Run.
Now, you'll need to find the top 10 players with the highest score in a single game.
- Copy the following query into the Query Editor:
This query will return the information of the top 10 NCAA basketball players based on the points they’ve scored in games and ranks them, in order from 1 to 10.
Notice that this query has two SELECT
statements. The first SELECT
statement creates a temporary table called rankings. The second SELECT
statement selects the following columns from the rankings table.
The RANK()
function is used to assign a ranking to each player based on their points.
- Click Run.
Whenever you're ranking a value that will include ties, it’s good to know the difference between ROW_NUMBER, DENSE_RANK, and RANK.
- ROW_NUMBER() will ignore ties. This could work if you arbitrarily want to return the top 10 players.
- DENSE_RANK() vs RANK() will rank differently based on ties.
- DENSE_RANK() does not skip numbers, so it will be 1, 2, 2, 3.
- RANK() skips the numbers, so it will be 1, 2, 2, 4.
Click Check my progress to verify that you have completed this task correctly.
Conclusion
Great work!
As a cloud data analyst at TheLook eCommerce, you have successfully provided the data needed for the marketing team to launch the first phase of an exciting ad campaign that promotes swimwear products featuring NCAA basketball players.
By exploring and filtering the tables in thelook_gcda dataset, you obtained the information on swim products sold in June 2023.
You also filtered tables in the ncaa_basketball public dataset to retrieve information about the highest scoring basketball players.
With this information, the marketing team will be able to make informed decisions about which swimwear products they should feature in the ad campaign and the high-performing players they should invite to model their product.
You’re well on your way to using powerful tools in the cloud to analyze data. Well done!
End your lab
Before you end the lab, make sure you’re satisfied that you’ve completed all the tasks. When you're ready, click End Lab and then click Submit.
Ending the lab will remove your access to the lab environment, and you won’t be able to access the work you've completed in it again.
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.