
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Use partitions to reduce the amount of data examined
/ 100
Partitioned tables are a powerful tool that cloud data analysts can use to improve query performance and reduce costs.
A partitioned table is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and control costs by reducing the number of bytes read by a query.
In this lab, you’ll partition tables by specifying a partition column which is used to segment the table, manage tables using default and customized settings, add and export data, and query tables.
Meredith, the head merchandiser of The Look eCommerce, wants to better understand the behavior of customers at the company's brick-and-mortar stores. Meredith is responsible for overseeing the company's product selection. But, they are concerned about not having a complete understanding of how customers are using the stores and the products they want to buy.
So Meredith asks Huan, a data scientist who works with the merchandising team, for help. Huan's role is to study the number of people who are dropped off near brick-and-mortar stores after taking a taxi ride. Huan believes that this data can provide valuable insights into customer shopping patterns. This information can help the merchandising team make more informed decisions about each store's stock.
To visualize the data for the merchandising team, Huan built a dashboard that shows various stores and how many people are dropped off nearby. But, it looks like the dashboard is not working out as Huan hoped! When they open up their email, they find multiple complaints from users that the dashboard is too slow. Huan also notices that the costs of running the dashboard are too high.
As a cloud data analyst for TheLook eCommerce, you have been asked to help to make the dashboard more efficient and cost-effective so that the merchandising team can have the data they need.
You ask Artem, the data architect for help. They tell you the problem may be related to how the data is being partitioned. The more data that must be scanned for each query, the more resources the query will use. They suggest that you test if partitions can be used to scan less data for queries. This will reduce both time and costs.
You suspect that the problem may be related to how the data is being partitioned. So you decide to test if partitions can be used to scan less data for queries.
Here’s how you'll do this task: First, you’ll conduct an analysis of partitioned tables. Next, you’ll check the amount of data analyzed. Then, you’ll run several queries, clearing the cache after each query. Finally, you’ll use partitions as filters.
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.
This practical lab lets you do the activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Click the Start Lab button. On the left is the Lab Details panel with the following:
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window) if you are running the Chrome browser. The Sign in page opens in a new browser tab.
Tip: You can arrange the tabs in separate, side-by-side windows to easily switch between them.
If necessary, copy the Google Cloud username below and paste it into the Sign in dialog. Click Next.
You can also find the Google Cloud username in the Lab Details panel.
You can also find the Google Cloud password in the Lab Details panel.
After a few moments, the Console opens in this tab.
In this task, you’ll analyze a partitioned table to identify the performance and cost issues that may be causing Huan's dashboard to be slow and expensive.
Notice the icon indicating that the taxi_trips table is partitioned.
In this task, you'll discover how partitioning limits the amount of data that needs to be examined each time a query is executed. Partitions increase query performance and aid cost reduction.
Click Run.
In the SQL Query editor action bar, click More > Query settings. The Query settings dialog opens.
Now, disable the query cache preference. In the Session management section, ensure that the Use session mode checkbox is cleared.
Click Save.
Replace the previous query in the Untitled tab with the following SQL query:
Notice the number of MB this query will process when run. For small browser windows, hover over the green check to view the number of MB this query will process when run.
Notice the number of MB this query will process when run. For small browser windows, hover over the green check to view the number of MB this query will process when run.
WHERE
statement goes way back to the 1900s, and all rows in the table will satisfy that criterion since the oldest pickup_datetime
in this table is December 2008 as displayed in column pickup_datetime.
Click Check my progress to verify that you have completed this task correctly.
Great work! You have successfully conducted an analysis of partitioned tables in BigQuery and helped Huan make their dashboard more efficient and cost-effective.
By examining the amount of data analyzed and running queries, you have explored ways to address the reported slow performance of the dashboard.
Additionally, by utilizing partitions as filters, you have tested the effectiveness of reducing the data scanned for queries, potentially providing insights for optimizing query efficiency and reducing costs for the dashboard.
You’re well on your way to understanding how to utilize data sources in BigQuery.
Before you end the lab, make sure you’re satisfied that you’ve completed all the tasks. When you're ready, click End Lab and then click Submit.
Ending the lab will remove your access to the lab environment, and you won’t be able to access the work you've completed in it again.
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one