BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery

Google Introduces Zero-ETL Approach to Analytics on Bigtable Data Using BigQuery

Recently, Google announced the general availability of Bigtable federated queries, with BigQuery allowing customers to query data residing in Bigtable via BigQuery faster. Moreover, the querying is without moving or copying the data in all Google Cloud regions with increased federated query concurrency limits, closing the longstanding gap between operational data and analytics, according to the company. 

BigQuery is Google Cloud's serverless, multi-cloud data warehouse that simplifies analytics by bringing data from various sources together – and Cloud Bigtable is Google Cloud's fully-managed, NoSQL database for time-sensitive transactional and analytical workloads. The latter is suitable for multiple use cases such as real-time fraud detection, recommendations, personalization, and time series. 

Previously, customers had to use ETL tools such as Dataflow or self-developed Python tools to copy data from Bigtable into BigQuery; however, now, they can query data directly with BigQuery SQL. The federated queries BigQuery can access data stored in Bigtable.

To query Bigtable data, users can create an external table for a Cloud Bigtable data source by providing the Cloud Bigtable URI – which can be obtained through the Cloud Bigtable console. The URI contains the following:

  • project_id is the project containing the Cloud Bigtable instance
  • instance_id is the Cloud Bigtable instance ID
  • (Optional) app_profile is the app profile ID to use
  • table_name is the name of the table for querying

 
Source: https://cloud.google.com/blog/products/data-analytics/bigtable-bigquery-federation-brings-hot--cold-data-closer

Once the external table is created, users can query Bigtable like any other table in BigQuery. In addition, users can also take advantage of BigQuery features like JDBC/ODBC drivers and connectors for popular Business Intelligence and data visualization tools such as Data Studio, Looker, and Tableau, in addition to AutoML tables for training machine learning models and BigQuery’s Spark connector to load data into their model development environments.

A big data enthusiast Christian Laurer explains in a medium article the benefit of the new approach with Bigtable’s federated queries:

Using the new approach, you can overcome some shortcomings of the traditional ETL approach. Such as:

•    More data freshness (up-to-date insights for your business, no hours or even days old data)
•    Not paying twice for the storage of the same data (customers normally store Terabytes or even more in Bigtable)
•    Less monitoring and maintaining of the ETL pipeline

Lastly, more details on Bigtable’s federated queries with BigQuery are available on the documentation page. Furthermore, Querying data in Cloud Bigtable is available in all supported Cloud Bigtable zones.

About the Author

Rate this Article

Adoption
Style

BT