Exploratory Data Analysis using Bigquery and Colab Enterprise
Overview
In this lab you learn the process of analyzing a dataset stored in BigQuery using Colab Enterprise to perform queries and present the data using various statistical plotting techniques. The analysis will help you discover patterns in the data.
Learning objectives
- Create a Colab Enterprise Notebook
- Connect to BigQuery datasets
- Perform statistical analyis on a Pandas Dataframe
- Create Seaborn plots for Exploratory Data Analysis in Python
- Write a SQL query to pick up specific fields from a BigQuery dataset
- Use version history to see code changes
- Share a Colab Enterprise notebook
Vertex AI is a unified platform for building, deploying, and managing machine learning (ML) applications.
Vertex AI Colab Enterprise is a powerful collaborative interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud. It offers the security and compliance features needed for enterprise organizations and integrates with other Google Cloud services like Vertex AI and BigQuery for an enhanced data science and machine learning workflow.
BigQuery is a powerful, fully managed, serverless data warehouse that allows you to analyze and manage large datasets with ease. BigQuery uses a familiar standard SQL dialect, making it easy for analysts and data scientists to use without needing to learn a new language.
Vertex AI offers two Notebook Solutions, Workbench and Colab Enterprise.
Colab Enterprise
The Colab Enterprise worksspace consists of five major sections (as shown in the image below), (1) Notebook storage; (2) Notebook actions; (3) Runtimes and Runtimes templates, (4) Notebook editor and (4) Notebook code cells. Notebook storage is the location of the notebooks, notebook actions are actions that can be performed on a notebook, runtimes let you "run" the notebook, notebook editor is for making edits to the notebook, and code cells let's you enter code.
Set up your Qwiklabs environments
Qwiklabs setup
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
-
Sign in to Qwiklabs using an incognito window.
-
Note the lab's access time (for example,
1:15:00
), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning. -
When ready, click Start lab.
-
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
-
Click Open Google Console.
-
Click Use another account and copy/paste credentials for this lab into the prompts.
If you use other credentials, you'll receive errors or incur charges. -
Accept the terms and skip the recovery resource page.
Task 1. Set up your environment
- Enable the Vertex AI API
Navigate to the Vertex AI section of your Cloud Console and click ENABLE ALL RECOMMENDED APIS.
Task 2. Create a Colab Enterprise Notebook
- In the Vertex AI section, scroll down to Notebooks. Click Colab Enterprise.
A "Welcome to Colab Enterprise" page then appears. Click the + CREATE A NEW NOTEBOOK at the bottom of the page.
- A new notebook will opened.
In order to execute the cells you need to create a runtime. Recall that runtimes are instances derived from runtime templates that allow users to run Colab notebooks. To create a runtime, you need to first create a runtime template.
- Click RUNTIME TEMPLATES. On the Runtime Templates page, click NEW TEMPLATE.
Runtime Basics
There are three steps. Step 2 and Step 3 are optional.
Step 1: Provide runtime basic information and select region as
Compute Configure
Step 2: Configure Compute (Optional)
Networking and Security
Step 3: Networking and Security (Optional)
Task 3. Run Code in a Colab Enterprise Notebook
In this example, the code cell below “Getting Started” uses numpy to generate some random data and uses matplotlib to visualize it.
-
Click Colab Enterprise then click on your created Notebook.
-
Copy the code below and click Run icon to run the cell.
import numpy as np from matplotlib import pyplot as plt ys = 200 + np.random.randn(100) x = [x for x in range(len(ys))] plt.plot(x, ys, '-') plt.fill_between(x, ys, 195, where=(ys > 195), facecolor='g', alpha=0.6) plt.title("Sample Visualization") plt.show()
As you execute the cell, a massage pops up indicating that the runtime is active and initiating a connection - as shown in the image below.
-
Click Open for Open OAuth popup then select your User Name then click Allow.
-
Now check the cell you executed, there should be a green check mark next to, this indicates that the cell executed properly.
- Now, make a change to the code - for example, change the title of the plot from "Sample Visualization" to "Colab Enterprise". Then, execute the cell.
Task 4. Show revision history.
One of the most important features of software development is the ability to track version history.
- Go to the Notebook Storage section. Click on the three dots beside the notebook you created.
- Select Revision History (as shown in bubble #1 in the image below).
When revision history is selected, you see the changes side by side with a date stamp and color-coding to see the “old” in red and the “new” in green (as shown in bubble #2). There are three options to view revision history: the raw source, the inline differences, or the source output (as shown in bubble #3).
Task 5. Add code to cells.
To add code or text to a Notebook, simply click on either code or text in the menu bar above the Notebook Editor.
Now, you will add several blocks of code to the notebook. After you copy a block of code, run the code to see the output. Note - Some cells will have not output (such as when you import the libraries). When you are done, share the notebook.
- Go back to the Notebook storage section then click on your Notebook.
- Re-run the first code again.
- Add a code cell.
- Copy the code below into the new cell.
Import libraries
- Click the Run icon to run the cell.
- There should be no output shown.
Insert the following code as cells to import the necessary modules and initialize a BigQuery client. The BigQuery client will be used to send and receive messages from the BigQuery API.
Import BigQuery Client
No output shown.
Download a BigQuery Table into a Pandas Dataframe
In Google, %%bigquery is a magic command used within Jupyter notebooks and other interactive environments to interact with BigQuery. Therefore, %%bigquery essentially tells your environment to wwitch to BigQuery mode: It prepares the environment to accept and execute BigQuery queries. BigQuery will execute the query, retrieve the data, and present it to you within the notebook environment, often as a pandas DataFrame (e.g. the "df" as shown in the code)
Output shown:
Show the first five rows of the Pandas Dataframe
Output shown:
Get information on the Pandas Dataframe
Output shown:
Get statistics on the Pandas Dataframe
Output shown:
Plot a correlation using Seaborn.
Output shown:
Write a SQL query to pick up specific fields from a BigQuery dataset
Output shown:
Get the first five rows of the new fields.
Output shown:
Task 6. Share the Notebook
Next, share your notebook.
- When you select “Share” by right-clicking on the notebook, a share permissions window appears, which allows you to edit or delete permissions, or select "Add Principal" to grant new access.
- When you grant principals access to a resource, you also add roles to specify what actions the principals can take. Optionally, you can add conditions to grant access to principals only when a specific criteria is met.
Principals are users, groups, domains, or service accounts Roles are composed of sets of permissions and determine what the principal can do with this resource.
Congratulations!
In this lab you learned how to:
- Create a Colab Enterprise Notebook
- Connect to BigQuery datasets
- Perform statistical analyis on a Pandas Dataframe
- Create Seaborn plots for Exploratory Data Analysis in Python
- Write a SQL query to pick up specific fields from a BigQuery dataset
- Use version history to see code changes
- Share a Colab Enterprise notebook
End your lab
When you have completed your lab, click End Lab. Qwiklabs removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
- 1 star = Very dissatisfied
- 2 stars = Dissatisfied
- 3 stars = Neutral
- 4 stars = Satisfied
- 5 stars = Very satisfied
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Manual Last Updated: August 26, 2024
Lab Last Tested: August 26, 2024
Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.