
Before you begin
- Labs create a Google Cloud project and resources for a fixed time
- Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
- On the top left of your screen, click Start lab to begin
In this lab, you own a fleet of New York City taxi cabs and are looking to monitor how well your business is doing in real-time. You build a streaming data pipeline to capture taxi revenue, passenger count, ride status, and much more, and then visualize the results in a management dashboard.
In this lab you learn how to:
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is the Lab Details panel with the following:
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
If necessary, copy the Username below and paste it into the Sign in dialog.
You can also find the Username in the Lab Details panel.
Click Next.
Copy the Password below and paste it into the Welcome dialog.
You can also find the Password in the Lab Details panel.
Click Next.
Click through the subsequent pages:
After a few moments, the Google Cloud console opens in this tab.
Google Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud.
Google Cloud Shell provides command-line access to your Google Cloud resources.
In Cloud console, on the top right toolbar, click the Open Cloud Shell button.
Click Continue.
It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:
gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
Output:
Example output:
Output:
Example output:
In this task, you create the taxirides
dataset. You have two different options which you can use to create this, using the Google Cloud Shell or the Google Cloud Console.
In this lab you will be using an extract of the NYC Taxi & Limousine Commission’s open dataset. A small, comma-separated, datafile will be used to simulate periodic updates of taxi data.
BigQuery is a serverless data warehouse. Tables in BigQuery are organized into datasets. In this lab, taxi data will flow from the standalone file via Dataflow to be stored in BigQuery. With this configuration, any new datafile deposited into the source Cloud Storage bucket would automatically be processed for loading.
Use one of the following options to create a new BigQuery dataset:
taxirides
dataset.taxirides.realtime
table (empty schema that you will stream into later).In the Google Cloud console, in the Navigation menu(), click BigQuery.
If you see the Welcome dialog, click Done.
Click on View actions () next to your Project ID, and then click Create dataset.
In Dataset ID, type taxirides.
In Data location, select:
then click Create Dataset.
In the Explorer pane, click expand node () to reveal the new taxirides dataset.
Click on View actions () next to the taxirides dataset, and then click Open.
Click Create Table.
In Table, type realtime
For the schema, click Edit as text and paste in the following:
In Partition and cluster settings, select timestamp.
Click Create Table.
In this task, you move the required files to your Project.
Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.
A Cloud Storage bucket was created for you during lab start up.
In this task, you set up a streaming data pipeline to read files from your Cloud Storage bucket and write data to BigQuery.
Dataflow is a serverless way to carry out data analysis.
In the Cloud console, in the Navigation menu (), click View all Products > Analytics > Dataflow.
In the top menu bar, click Create Job From Template.
Type streaming-taxi-pipeline as the Job name for your Dataflow job.
In Regional endpoint, select
Click Required Parameters.
In Temporary location, used for writing temporary files, paste or type:
In Max workers, type 2
In Number of workers, type 1
Uncheck Use default machine type.
Under General purpose, choose the following:
Series: E2
Machine type: e2-medium (2 vCPU, 4 GB memory)
A new streaming job has started! You can now see a visual representation of the data pipeline. It will take 3 to 5 minutes for data to begin moving into BigQuery.
In this task, you analyze the data as it is streaming.
In the Cloud console, in the Navigation menu (), click BigQuery.
If the Welcome dialog appears, click Done.
In the Query Editor, type the following, and then click Run:
Your output will look similar to the following:
In this task, you calculate aggregations on the stream for reporting.
In the Query Editor, clear the current query.
Copy and paste the following query, and then click Run.
The result shows key metrics by the minute for every taxi drop-off.
Click Save > Save query.
In the Save query dialog, in the Name field, type My Saved Query.
In Region, ensure that the region matches the Qwiklabs Lab Region.
Click Save.
In this task, you stop the Dataflow job to free up resources for your project.
In the Cloud console, in the Navigation menu (), click View all Products > Analytics > Dataflow.
Click the streaming-taxi-pipeline, or the new job name.
Click Stop, and then select Cancel > Stop Job.
In this task, you create a real-time dashboard to visualize the data.
In the Cloud console, in the Navigation menu (), click BigQuery.
In the Explorer Pane, expand your Project ID.
Expand Queries, and then click My Saved Query.
Your query is loaded in to the query editor.
Click Run.
In the Query results section, click Open in > Looker Studio.
Looker Studio Opens. Click Get started.
In the Looker Studio window, click your bar chart.
(
The Chart pane appears.
Click Add a chart, and then select Combo chart.
In the Setup pane, in Data Range Dimension, hover over minute (Date) and click X to remove it.
In the Data pane, click dashboard_sort and drag it to Setup > Data Range Dimension > Add dimension.
In Setup > Dimension, click minute, and then select dashboard_sort.
In Setup > Metric, click dashboard_sort, and then select total_rides.
In Setup > Metric, click Record Count, and then select total_passengers.
In Setup > Metric, click Add metric, and then select total_revenue.
In Setup > Sort, click total_rides, and then select dashboard_sort.
In Setup > Sort, click Ascending.
Your chart should look similar to this:
When you're happy with your dashboard, click Save and share to save this data source.
If prompted to complete your account setup, type your country and company details, agree to the terms and conditions, and then click Continue.
If prompted which updates you want to receive, answer no to all, then click Continue.
If prompted with the Review data access before saving window, click Acknowledge and save.
If prompted to choose an account select your Student Account.
Whenever anyone visits your dashboard, it will be up-to-date with the latest transactions. You can try it yourself by clicking More options (), and then Refresh data.
In this task, you create a time series chart.
Click this Looker Studio link to open Looker Studio in a new browser tab.
On the Reports page, in the Start with a Template section, click the [+] Blank Report template.
A new, empty report opens with the Add data to report window.
From the list of Google Connectors, select the BigQuery tile.
Click Custom Query, and then select your ProjectID. This should appear in the following format, qwiklabs-gcp-xxxxxxx.
In Enter Custom Query, paste the following query:
Click Add > Add To Report.
A new untitled report appears. It may take up to a minute for the screen to finish refreshing.
In the Data pane, click Add a Field > Add calculated field.
Click All Fields on the left corner.
Change the timestamp field type to Date & Time > Date Hour Minute (YYYYMMDDhhmm).
In the change timestamp dialog, click Continue, and then click Done.
In the top menu, click Add a chart.
Choose Time series chart.
Position the chart in the bottom left corner - in the blank space.
In Setup > Dimension, click timestamp (Date), and then select timestamp.
In Setup > Dimension, click timestamp, and then select calendar.
In Data Type, select Date & Time > Date Hour Minute.
Click outside the dialog to close it. You do not need to add a name.
In Setup > Metric, click Record Count, and then select meter reading.
In this lab, you used Dataflow to stream data through a pipeline into BigQuery.
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.