In this lab, you own a fleet of New York City taxi cabs and are looking to monitor how well your business is doing in real-time. You build a streaming data pipeline to capture taxi revenue, passenger count, ride status, and much more, and then visualize the results in a management dashboard.
Objectives
In this lab you learn how to:
Create a Dataflow job from a template
Stream a Dataflow pipeline into BigQuery
Monitor a Dataflow pipeline in BigQuery
Analyze results with SQL
Visualize key metrics in Looker Studio
Setup and requirements
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method.
On the left is the Lab Details panel with the following:
The Open Google Cloud console button
Time remaining
The temporary credentials that you must use for this lab
Other information, if needed, to step through this lab
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
Note: If you see the Choose an account dialog, click Use Another Account.
If necessary, copy the Username below and paste it into the Sign in dialog.
{{{user_0.username | "Username"}}}
You can also find the Username in the Lab Details panel.
Click Next.
Copy the Password below and paste it into the Welcome dialog.
{{{user_0.password | "Password"}}}
You can also find the Password in the Lab Details panel.
Click Next.
Important: You must use the credentials the lab provides you. Do not use your Google Cloud account credentials.
Note: Using your own Google Cloud account for this lab may incur extra charges.
Click through the subsequent pages:
Accept the terms and conditions.
Do not add recovery options or two-factor authentication (because this is a temporary account).
Do not sign up for free trials.
After a few moments, the Google Cloud console opens in this tab.
Note: To view a menu with a list of Google Cloud products and services, click the Navigation menu at the top-left, or type the service or product name in the Search field.
Activate Google Cloud Shell
Google Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud.
Google Cloud Shell provides command-line access to your Google Cloud resources.
In Cloud console, on the top right toolbar, click the Open Cloud Shell button.
Click Continue.
It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:
gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
You can list the active account name with this command:
[core]
project = qwiklabs-gcp-44776a13dea667a6
Note:
Full documentation of gcloud is available in the
gcloud CLI overview guide
.
Task 1. Create a BigQuery dataset
In this task, you create the taxirides dataset. You have two different options which you can use to create this, using the Google Cloud Shell or the Google Cloud Console.
In this lab you will be using an extract of the NYC Taxi & Limousine Commission’s open dataset. A small, comma-separated, datafile will be used to simulate periodic updates of taxi data.
BigQuery is a serverless data warehouse. Tables in BigQuery are organized into datasets. In this lab, taxi data will flow from the standalone file via Dataflow to be stored in BigQuery. With this configuration, any new datafile deposited into the source Cloud Storage bucket would automatically be processed for loading.
Use one of the following options to create a new BigQuery dataset:
Option 1: The command-line tool
In Cloud Shell (), run the following command to create the taxirides dataset.
In Partition and cluster settings, select timestamp.
Click Create Table.
Task 2. Copy required lab artifacts
In this task, you move the required files to your Project.
Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.
A Cloud Storage bucket was created for you during lab start up.
In Cloud Shell (), run the following commands to move files needed for the Dataflow job.
A new streaming job has started! You can now see a visual representation of the data pipeline. It will take 3 to 5 minutes for data to begin moving into BigQuery.
Note: If the dataflow job fails for the first time then re-create a new job template with new job name and run the job.
Task 4. Analyze the taxi data using BigQuery
In this task, you analyze the data as it is streaming.
In the Cloud console, in the Navigation menu (), click BigQuery.
If the Welcome dialog appears, click Done.
In the Query Editor, type the following, and then click Run:
SELECT * FROM taxirides.realtime LIMIT 10
Note: If no records are returned, wait another minute and re-run the above query (Dataflow takes 3-5 minutes to setup the stream).
Your output will look similar to the following:
Task 5. Perform aggregations on the stream for reporting
In this task, you calculate aggregations on the stream for reporting.
In the Query Editor, clear the current query.
Copy and paste the following query, and then click Run.
WITH streaming_data AS (
SELECT
timestamp,
TIMESTAMP_TRUNC(timestamp, HOUR, 'UTC') AS hour,
TIMESTAMP_TRUNC(timestamp, MINUTE, 'UTC') AS minute,
TIMESTAMP_TRUNC(timestamp, SECOND, 'UTC') AS second,
ride_id,
latitude,
longitude,
meter_reading,
ride_status,
passenger_count
FROM
taxirides.realtime
ORDER BY timestamp DESC
LIMIT 1000
)
# calculate aggregations on stream for reporting:
SELECT
ROW_NUMBER() OVER() AS dashboard_sort,
minute,
COUNT(DISTINCT ride_id) AS total_rides,
SUM(meter_reading) AS total_revenue,
SUM(passenger_count) AS total_passengers
FROM streaming_data
GROUP BY minute, timestamp
Note: Ensure Dataflow is registering data in BigQuery before proceeding to the next task.
The result shows key metrics by the minute for every taxi drop-off.
Click Save > Save query.
In the Save query dialog, in the Name field, type My Saved Query.
In Region, ensure that the region matches the Qwiklabs Lab Region.
Click Save.
Task 6. Stop the Dataflow Job
In this task, you stop the Dataflow job to free up resources for your project.
In the Cloud console, in the Navigation menu (), click View all Products > Analytics > Dataflow.
Click the streaming-taxi-pipeline, or the new job name.
Click Stop, and then select Cancel > Stop Job.
Task 7. Create a real-time dashboard
In this task, you create a real-time dashboard to visualize the data.
In the Cloud console, in the Navigation menu (), click BigQuery.
In the Explorer Pane, expand your Project ID.
Expand Queries, and then click My Saved Query.
Your query is loaded in to the query editor.
Click Run.
In the Query results section, click Open in > Looker Studio.
Looker Studio Opens. Click Get started.
In the Looker Studio window, click your bar chart.
(
The Chart pane appears.
Click Add a chart, and then select Combo chart.
In the Setup pane, in Data Range Dimension, hover over minute (Date) and click X to remove it.
In the Data pane, click dashboard_sort and drag it to Setup > Data Range Dimension > Add dimension.
In Setup > Dimension, click minute, and then select dashboard_sort.
In Setup > Metric, click dashboard_sort, and then select total_rides.
In Setup > Metric, click Record Count, and then select total_passengers.
In Setup > Metric, click Add metric, and then select total_revenue.
In Setup > Sort, click total_rides, and then select dashboard_sort.
In Setup > Sort, click Ascending.
Your chart should look similar to this:
Note: Visualizing data at a minute-level granularity is currently not supported in Looker Studio as a timestamp. This is why we created our own dashboard_sort dimension.
When you're happy with your dashboard, click Save and share to save this data source.
If prompted to complete your account setup, type your country and company details, agree to the terms and conditions, and then click Continue.
If prompted which updates you want to receive, answer no to all, then click Continue.
If prompted with the Review data access before saving window, click Acknowledge and save.
If prompted to choose an account select your Student Account.
Whenever anyone visits your dashboard, it will be up-to-date with the latest transactions. You can try it yourself by clicking More options (), and then Refresh data.
Task 8. Create a time series dashboard
In this task, you create a time series chart.
Click this Looker Studio link to open Looker Studio in a new browser tab.
On the Reports page, in the Start with a Template section, click the [+] Blank Report template.
A new, empty report opens with the Add data to report window.
From the list of Google Connectors, select the BigQuery tile.
Click Custom Query, and then select your ProjectID. This should appear in the following format, qwiklabs-gcp-xxxxxxx.
In Enter Custom Query, paste the following query:
SELECT
*
FROM
taxirides.realtime
WHERE
ride_status='enroute'
Click Add > Add To Report.
A new untitled report appears. It may take up to a minute for the screen to finish refreshing.
Create a time series chart
In the Data pane, click Add a Field > Add calculated field.
Click All Fields on the left corner.
Change the timestamp field type to Date & Time > Date Hour Minute (YYYYMMDDhhmm).
In the change timestamp dialog, click Continue, and then click Done.
In the top menu, click Add a chart.
Choose Time series chart.
Position the chart in the bottom left corner - in the blank space.
In Setup > Dimension, click timestamp (Date), and then select
timestamp.
In Setup > Dimension, click timestamp, and then select calendar.
In Data Type, select Date & Time > Date Hour Minute.
Click outside the dialog to close it. You do not need to add a name.
In Setup > Metric, click Record Count, and then select meter reading.
Congratulations!
In this lab, you used Dataflow to stream data through a pipeline into BigQuery.
End your lab
When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
1 star = Very dissatisfied
2 stars = Dissatisfied
3 stars = Neutral
4 stars = Satisfied
5 stars = Very satisfied
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2024 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
Les ateliers créent un projet Google Cloud et des ressources pour une durée déterminée.
Les ateliers doivent être effectués dans le délai imparti et ne peuvent pas être mis en pause. Si vous quittez l'atelier, vous devrez le recommencer depuis le début.
En haut à gauche de l'écran, cliquez sur Démarrer l'atelier pour commencer.
Utilisez la navigation privée
Copiez le nom d'utilisateur et le mot de passe fournis pour l'atelier
Cliquez sur Ouvrir la console en navigation privée
Connectez-vous à la console
Connectez-vous à l'aide des identifiants qui vous ont été attribués pour l'atelier. L'utilisation d'autres identifiants peut entraîner des erreurs ou des frais.
Acceptez les conditions d'utilisation et ignorez la page concernant les ressources de récupération des données.
Ne cliquez pas sur Terminer l'atelier, à moins que vous n'ayez terminé l'atelier ou que vous ne vouliez le recommencer, car cela effacera votre travail et supprimera le projet.
Ce contenu n'est pas disponible pour le moment
Nous vous préviendrons par e-mail lorsqu'il sera disponible
Parfait !
Nous vous contacterons par e-mail s'il devient disponible
Un atelier à la fois
Confirmez pour mettre fin à tous les ateliers existants et démarrer celui-ci
Utilisez la navigation privée pour effectuer l'atelier
Ouvrez une fenêtre de navigateur en mode navigation privée pour effectuer cet atelier. Vous éviterez ainsi les conflits entre votre compte personnel et le compte temporaire de participant, qui pourraient entraîner des frais supplémentaires facturés sur votre compte personnel.
Want to build streaming data pipelines that feed management dashboards? In this lab you will build a pipeline that streams large volumes of records into BigQuery for analysis and visualization.
Durée :
0 min de configuration
·
Accessible pendant 60 min
·
Terminé après 60 min