
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
Create BigQuery dataset and tables
/ 20
Load data into event table
/ 20
Create nested and repeated fields
/ 20
Create partitioned tables
/ 20
Create clustered tables
/ 20
In BigQuery, you organize data into BigQuery datasets and define the schema (or structure) of each table using column names and data types. The schema of a table can impact query performance and cost in BigQuery, as it determines how quickly and efficiently BigQuery can access and process the data in a table. BigQuery supports flexible schemas, and it is possible to make changes to the schema without having to rewrite the data.
The objective of this lab is to provide Teradata professionals with the necessary knowledge and skills to start designing and implementing effective BigQuery table schemas. Upon completion of this lab, Teradata professionals will have a deeper understanding of how to design, optimize, and query table schemas in BigQuery
In this lab, you create BigQuery datasets and tables to store data, create nested and repeated fields to maintain relationships in denormalized data, and create partitioned and clustered tables to optimize query performance.
In this lab, you learn how to:
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Sign in to Qwiklabs using an incognito window.
Note the lab's access time (for example, 1:15:00
), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.
Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Choose an account page.
On the Choose an account page, click Use Another Account. The Sign in page opens.
Paste the username that you copied from the Connection Details panel. Then copy and paste the password.
After a few moments, the Cloud console opens in this tab.
In BigQuery, you can use data definition language (DDL) to create datasets and tables. You can also use the SQL statement LOAD DATA to load data from one or more files into a new or existing table.
To learn more about using DDL statements to create BigQuery datasets and tables and using the LOAD DATA SQL statement to load data, review the documentation titled CREATE SCHEMA statement, CREATE TABLE statement, and LOAD DATA statement.
In this task, you use DDL to create a dataset and tables in BigQuery, and then load data into the new tables using the LOAD DATA statement.
The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and the release notes.
Click Done.
In the SQL Workspace toolbar, click the Editor icon to open the SQL query editor.
This query creates a new BigQuery dataset named ticket_sales. Note that the DDL statement uses the term SCHEMA to refer to a logical collection of tables, views, and other resources, which in BigQuery is known as a dataset.
This query creates a new table called sales in the ticket_sales dataset.
Expand the Explorer pane which lists the dataset and table, and then click on the table name sales.
Click on the Details and Preview tabs to see more information about the table.
Note that there is no data in the table yet.
This query uses an explicit table schema definition to load data into the sales table from a CSV file in Cloud Storage.
The Results pane displays a message that the LOAD statement ran successfully.
You can click Refresh (top right) to refresh the data in the Preview tab.
Click Check my progress to verify the objective.
You have now used SQL statements in BigQuery to create a dataset and a table, and load data into it. Practice creating another table and loading data for event information.
This query uses schema auto-detection to load data into the events table from a CSV file in Cloud Storage.
The Results pane displays a message that the LOAD statement ran successfully.
You can click Refresh (top right) to refresh the data in the Preview tab.
Click Check my progress to verify the objective.
Denormalization is a common strategy for increasing read performance for relational datasets that were previously normalized. In BigQuery, the recommended way to denormalize data is to use nested and repeated fields. You can use nested and repeated fields to maintain relationships in denormalized data, instead of completely flattening your data.
To learn more about nested and repeated fields in BigQuery, review the documentation titled Use nested and repeated fields.
In this task, you learn how to create and query nested and repeated fields in BigQuery.
While the SQL for step 2 is similar to previous SQL in step 1, notice the addition of the ARRAY_AGG, STRUCT, and GROUP BY functions.
Rather than repeating the data on the "one" side of the one-to-many relationship, the data on the "many" side is now in an array of structs.
You can also wrap the previous query in a CREATE TABLE statement to create a nested hierarchical table.
The schema includes a nested and repeated field named sales, which contains the sale time, quantity sold, price paid, and commission for each event sale.
This new nested and repeated structure changes the way that you write queries.
What if you wanted to see the top commissions per event? That would require querying inside the array. To do that, you need to unnest (or flatten) the array.
The UNNEST operator is used to flatten the sales array, so that it can be queried, and those results are converted into an array.
To learn more about using using UNNEST to flatten arrays, review the documentation titled UNNEST operator.
Click Check my progress to verify the objective.
In BigQuery, one method of reducing the number of bytes processed by a query is to divide a large table into smaller segments called partitions, and then include a filter in your queries to select only data from the appropriate partition. This process is known as partition pruning and can be used to reduce query costs.
To learn more about partitioned tables and partition pruning, review the documentation titled Introduction to partitioned tables and Query partitioned tables.
In this task, you learn how to create and query time-unit partitioned tables (on a DATETIME column) to minimize the number of bytes processed by queries.
The BigQuery query validator provides an estimate of the number of bytes that will be processed before you run it. Note the estimated number for this query (15.12 MB).
Notice that the number of bytes processed is the same as the query in step 1 (15.12 MB), even though the new query is only asking for sales from a specific date.
The query processes less data (18.98 KB) because it is running on the partitioned table. BigQuery is able to use partition pruning to process less data which can result in less cost and faster queries.
Click Check my progress to verify the objective.
Another method of optimizing query performance in BigQuery is to cluster values within a table to sort and group data into logical storage blocks. Queries that filter or aggregate by the clustered columns only scan the relevant blocks based on the clustered columns, instead of the entire table or table partition. This process is known as block pruning and can make joins, searches, grouping, and sorting faster.
To learn more about clustered tables and block pruning, review the documentation titled Introduction to clustered tables and Querying clustered tables.
In this task, you learn how to create and query clustered tables to optimize query performance.
This query uses one of Google's public datasets that contains a lot of data (in this case, 1 billion rows). The query counts views by language in a table of Wikipedia data.
Note the number of bytes shuffled (619.68 KB).
This command can take up to several minutes to run.
Notice that the number of bytes shuffled is lower (47.98 KB) when the same query is run on the clustered table. Fewer bytes shuffled means faster execution time in BigQuery.
Click Check my progress to verify the objective.
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
This content is not currently available
We will notify you via email when it becomes available
Great!
We will contact you via email if it becomes available
One lab at a time
Confirm to end all existing labs and start this one