arrow_back

Google Cloud Fundamentals: Getting Started with BigQuery

Sign in Join
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Google Cloud Fundamentals: Getting Started with BigQuery

Lab 30 minutes universal_currency_alt 5 Credits show_chart Introductory
info This lab may incorporate AI tools to support your learning.
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Overview

In this lab, you load a web server log into a BigQuery table. After loading the data, you query it using the BigQuery web user interface and the BigQuery CLI.

BigQuery helps you perform interactive analysis of petabyte-scale databases, and it enables near-real time analysis of massive datasets. It offers a familiar SQL 2011 query language and functions.

Data stored in BigQuery is highly durable. Google stores your data in a replicated manner by default and at no additional charge for replicas. With BigQuery, you pay only for the resources you use. Data storage in BigQuery is inexpensive. Queries incur charges based on the amount of data they process: when you submit a query, you pay for the compute nodes only for the duration of that query. You don't have to pay to keep a compute cluster up and running.

Using BigQuery involves interacting with a number of Google Cloud Platform resources, including projects (covered elsewhere in this course), datasets, tables, and jobs. This lab introduces you to some of these resources, and this brief introduction summarizes their role in interacting with BigQuery.

Datasets: A dataset is a grouping mechanism that holds zero or more tables. A dataset is the lowest level unit of access control. Datasets are owned by GCP projects. Each dataset can be shared with individual users.

Tables: A table is a row-column structure that contains actual data. Each table has a schema that describes strongly typed columns of values. Each table belongs to a dataset.

Objectives

In this lab, you learn how to perform the following tasks:

  • Load data from Cloud Storage into BigQuery.
  • Perform a query on the data in BigQuery.

Task 1. Sign in to the Google Cloud Console

For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.

  1. Sign in to Qwiklabs using an incognito window.

  2. Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
    There is no pause feature. You can restart if needed, but you have to start at the beginning.

  3. When ready, click Start lab.

  4. Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.

  5. Click Open Google Console.

  6. Click Use another account and copy/paste credentials for this lab into the prompts.
    If you use other credentials, you'll receive errors or incur charges.

  7. Accept the terms and skip the recovery resource page.

Make a note of whether your assigned region is closer to the United States or to Europe.

Task 2. Load data from Cloud Storage into BigQuery

  1. In the Console, on the Navigation menu (Navigation menu icon) click BigQuery then click Done.

  2. Create a new dataset within your project by clicking on View actions icon next to your project ID in the Explorer section. Then select Create dataset.

  3. In the Create Dataset dialog, for Dataset ID, type logdata.

  4. For Data location, select us (multiple regions in United States). Click CREATE DATASET.

  5. Create a new table in the logdata to store the data from the CSV file.

  6. Expand your project ID, and click on View actions icon next to the logdata dataset. Then select Create Table.

  7. On the Create Table page, in the Source section:

  • For Create table from, select Google Cloud Storage, and in the field, type cloud-training/gcpfci/access_log.csv.
  • Verify File format is set to CSV.
Note: When you have created a table previously, the Create from Previous Job option allows you to quickly use your settings to create similar tables.
  1. In the Destination section:
  • For Dataset name, leave logdata selected.
  • For Table name, type accesslog.
  • For Table type, Native table should be selected.
  1. Under Schema section, check the Auto detect.

  2. Accept the remaining default values and click Create table.

    BigQuery creates a load job to create the table and upload data into the table (this may take a few seconds).

  3. (Optional) To track job progress, click Job History.

  4. When the load job is complete, click logdata > accesslog.

  5. On the table details page, click Details to view the table properties, and then click Preview to view the table data.

    Each row in this table logs a hit on a web server. The first field, string_field_0, is the IP address of the client. The fourth through ninth fields log the day, month, year, hour, minute, and second at which the hit occurred. In this activity, you will learn about the daily pattern of load on this web server.

Click Check my progress to verify the objective. Load data from Cloud Storage into BigQuery

Task 3. Perform a query on the data using the BigQuery web UI

In this section of the lab, you use the BigQuery web UI to query the accesslog table you created previously.

  1. In the query EDITOR, type (or copy-and-paste) the following query:

  2. Because you told BigQuery to automatically discover the schema when you load the data, the hour of the day during which each web hit arrived is in a field called int_field_6.

    select int64_field_6 as hour, count(*) as hitcount from logdata.accesslog group by hour order by hour

    Notice that the Query Validator tells you that the query syntax is valid (indicated by the green check mark) and indicates how much data the query will process. The amount of data processed allows you to determine the price of the query using the Cloud Platform Pricing Calculator.

  3. Click RUN and examine the results. At what time of day is the website busiest? When is it least busy?

Task 4. Perform a query on the data using the bq command

In this section of the lab, you use the bq command in Cloud Shell to query the accesslog table you created previously.

  1. On the Google Cloud Platform Console, click Activate Cloud Shell (Activate Cloud Shell icon) then click Continue.

  2. At the Cloud Shell prompt, enter this command:

    bq query "select string_field_10 as request, count(*) as requestcount from logdata.accesslog group by request order by requestcount desc"

    The first time you use the bq command, it caches your Google Cloud Platform credentials, and then asks you to choose your default project. Choose the project that Qwiklabs assigned you to. Its name will look like qwiklabs-gcp- followed by a hexadecimal number.

    The bq command then performs the action requested on its command line. What URL offered by this web server was most popular? Which was least popular?

Congratulations!

In this lab, you loaded data stored in Cloud Storage into a table hosted by Google BigQuery. You then queried the data to discover patterns.

End your lab

When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

  • 1 star = Very dissatisfied
  • 2 stars = Dissatisfied
  • 3 stars = Neutral
  • 4 stars = Satisfied
  • 5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

This content is not currently available

We will notify you via email when it becomes available

Great!

We will contact you via email if it becomes available