arrow_back

Exploratory Data Analysis using Bigquery and Colab Enterprise

Accedi Partecipa
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Exploratory Data Analysis using Bigquery and Colab Enterprise

Lab 2 ore universal_currency_alt 1 credito show_chart Intermedio
Test and share your knowledge with our community!
done
Get access to over 700 hands-on labs, skill badges, and courses

Overview

In this lab you learn the process of analyzing a dataset stored in BigQuery using Colab Enterprise to perform queries and present the data using various statistical plotting techniques. The analysis will help you discover patterns in the data.

Learning objectives

  • Create a Colab Enterprise Notebook
  • Connect to BigQuery datasets
  • Perform statistical analyis on a Pandas Dataframe
  • Create Seaborn plots for Exploratory Data Analysis in Python
  • Write a SQL query to pick up specific fields from a BigQuery dataset
  • Use version history to see code changes
  • Share a Colab Enterprise notebook

Vertex AI is a unified platform for building, deploying, and managing machine learning (ML) applications.

Vertex AI Colab Enterprise is a powerful collaborative interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud. It offers the security and compliance features needed for enterprise organizations and integrates with other Google Cloud services like Vertex AI and BigQuery for an enhanced data science and machine learning workflow.

BigQuery is a powerful, fully managed, serverless data warehouse that allows you to analyze and manage large datasets with ease. BigQuery uses a familiar standard SQL dialect, making it easy for analysts and data scientists to use without needing to learn a new language.

Vertex AI offers two Notebook Solutions, Workbench and Colab Enterprise.

Colab

Colab Enterprise

The Colab Enterprise worksspace consists of five major sections (as shown in the image below), (1) Notebook storage; (2) Notebook actions; (3) Runtimes and Runtimes templates, (4) Notebook editor and (4) Notebook code cells. Notebook storage is the location of the notebooks, notebook actions are actions that can be performed on a notebook, runtimes let you "run" the notebook, notebook editor is for making edits to the notebook, and code cells let's you enter code.

Colab Runtime

Set up your Qwiklabs environments

Qwiklabs setup

For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.

  1. Sign in to Qwiklabs using an incognito window.

  2. Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
    There is no pause feature. You can restart if needed, but you have to start at the beginning.

  3. When ready, click Start lab.

  4. Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.

  5. Click Open Google Console.

  6. Click Use another account and copy/paste credentials for this lab into the prompts.
    If you use other credentials, you'll receive errors or incur charges.

  7. Accept the terms and skip the recovery resource page.

Task 1. Set up your environment

  1. Enable the Vertex AI API

Navigate to the Vertex AI section of your Cloud Console and click ENABLE ALL RECOMMENDED APIS.

Task 2. Create a Colab Enterprise Notebook

  1. In the Vertex AI section, scroll down to Notebooks. Click Colab Enterprise.

select_colab

A "Welcome to Colab Enterprise" page then appears. Click the + CREATE A NEW NOTEBOOK at the bottom of the page.

screate_nb

  1. A new notebook appears with prepopulated cells. This is the "Getting Started" notebook.

get_started

In order to execute the cells you need to create a runtime. Recall that runtimes are instances derived from runtime templates that allow users to run Colab notebooks. To create a runtime, you need to first create a runtime template.

  1. Click RUNTIME TEMPLATES. On the Runtime Templates page, click NEW TEMPLATE.

Colab Runtime

Runtime Basics

There are three steps. Step 2 and Step 3 are optional.

Step 1: Provide runtime basic information.

Colab Runtime

Compute Configure

Step 2: Configure Compute (Optional)

Colab Runtime

Networking and Security

Step 3: Networking and Security (Optional)

Colab Runtime

Task 3. Run Code in a Colab Enterprise Notebook

In this example, the code cell below “Getting Started” uses numpy to generate some random data and uses matplotlib to visualize it.

  1. Click Colab Enterprise then click on your created Notebook.

  2. Click the Run icon to run the cell.

run_icon

As you execute the cell, a massage pops up indicating that the runtime is active and initiating a connection - as shown in the image below.

Colab Runtime

  1. Click Open for Open OAuth popup then select your User Name then click Allow.

  2. Now check the cell you executed, there should be a green check mark next to, this indicates that the cell executed properly.

Colab Output

  1. Now, make a change to the code - for example, change the title of the plot from "Sample Visualization" to "Colab Enterprise". Then, execute the cell.

Colab Output2

Task 4. Show revision history.

One of the most important features of software development is the ability to track version history.

  1. Go to the Notebook Storage section. Click on the three dots beside the notebook you created.

actions

  1. Select Revision History (as shown in bubble #1 in the image below).

When revision history is selected, you see the changes side by side with a date stamp and color-coding to see the “old” in red and the “new” in green (as shown in bubble #2). There are three options to view revision history: the raw source, the inline differences, or the source output (as shown in bubble #3).

revision_history

Task 5. Add code to cells.

To add code or text to a Notebook, simply click on either code or text in the menu bar above the Notebook Editor.

add_code

Now, you will add several blocks of code to the notebook. After you copy a block of code, run the code to see the output. Note - Some cells will have not output (such as when you import the libraries). When you are done, share the notebook.

  1. Go back to the Notebook storage section then click on your Notebook.
  2. Re-run the first code again.
  3. Add a code cell.
  4. Copy the code below into the new cell.

Import libraries

import seaborn as sns import pandas as pd import numpy as np from google.cloud import bigquery bq = bigquery.Client()
  1. Click the Run icon to run the cell.
  2. There should be no output shown.

Insert the following code as cells to import the necessary modules and initialize a BigQuery client. The BigQuery client will be used to send and receive messages from the BigQuery API.

Import BigQuery Client

client = bigquery.Client() query = """SELECT * FROM `bigquery-public-data.catalonian_mobile_coverage_eu.mobile_data_2015_2017` LIMIT 1000""" job = client.query(query) df = job.to_dataframe()

No output shown.

Download a BigQuery Table into a Pandas Dataframe

In Google, %%bigquery is a magic command used within Jupyter notebooks and other interactive environments to interact with BigQuery. Therefore, %%bigquery essentially tells your environment to wwitch to BigQuery mode: It prepares the environment to accept and execute BigQuery queries. BigQuery will execute the query, retrieve the data, and present it to you within the notebook environment, often as a pandas DataFrame (e.g. the "df" as shown in the code)

%%bigquery df SELECT * FROM `bigquery-public-data.catalonian_mobile_coverage_eu.mobile_data_2015_2017`

Output shown: abq_df_output

Show the first five rows of the Pandas Dataframe

df.head()

Output shown: adf.headoutput

Get information on the Pandas Dataframe

df.info()

Output shown: adf.infooutput

Get statistics on the Pandas Dataframe

df.describe()

Output shown: df.describeoutput

Plot a correlation using Seaborn.

plt.figure(figsize=(10,5)) sns.heatmap(df.corr(),annot=True,vmin=0,vmax=1,cmap='viridis')

Output shown: df.corroutput

Write a SQL query to pick up specific fields from a BigQuery dataset

%%bigquery df2 SELECT signal, status FROM `bigquery-public-data.catalonian_mobile_coverage_eu.mobile_data_2015_2017`

Output shown: abq_df_output

Get the first five rows of the new fields.

df2.info()

Output shown: asql_output

Task 6. Share the Notebook

Next, share your notebook.

  1. When you select “Share” by right-clicking on the notebook, a share permissions window appears, which allows you to edit or delete permissions, or select "Add Principal" to grant new access.

share

  1. When you grant principals access to a resource, you also add roles to specify what actions the principals can take. Optionally, you can add conditions to grant access to principals only when a specific criteria is met.

Principals are users, groups, domains, or service accounts Roles are composed of sets of permissions and determine what the principal can do with this resource.

Congratulations!

In this lab you learned how to:

  • Create a Colab Enterprise Notebook
  • Connect to BigQuery datasets
  • Perform statistical analyis on a Pandas Dataframe
  • Create Seaborn plots for Exploratory Data Analysis in Python
  • Write a SQL query to pick up specific fields from a BigQuery dataset
  • Use version history to see code changes
  • Share a Colab Enterprise notebook

End your lab

When you have completed your lab, click End Lab. Qwiklabs removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

  • 1 star = Very dissatisfied
  • 2 stars = Dissatisfied
  • 3 stars = Neutral
  • 4 stars = Satisfied
  • 5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Manual Last Updated: December 15, 2023

Manual Last Updated: December 15, 2023

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.