Table of Contents

Configuring direct connection to Google BigQuery

Ramya Priya Updated by Ramya Priya

BigQuery Connector

Google Cloud has released an official version of the Spark BigQuery connector that doesn’t require a GCS bucket to read the data into Tellius. 

Instead of using Google Cloud Storage as an intermediary, the data is streamed in parallel using the Storage API from BigQuery via gRPC. Check out https://github.com/GoogleCloudDataproc/spark-bigquery-connector

When a dataset is created from a table, materialized view, or query, the data streams from BigQuery into Tellius without leaving any files or tables in between.

Loading from table

To load the data from a table, the following details are required from the user:

  • ProjectID
  • Dataset name
  • Table name
  • Credentials File

Loading from View

By default, BigQuery does not materialize loading data from Views. It is necessary to materialize the View first, by saving the output to a separate table and then loading the data from that table.

The intermediate table is a temporary table with a Time-To-Live (TTL) of 24 hours, and it will be deleted once the TTL ends. The TTL can be set to less than 24 hours. For this integration, it is recommended to build a separate materialization dataset (with full write access to Tellius).

The following details are required from the user:

  • ProjectID
  • Dataset name
  • Table name
  • Materialization dataset
  • Credentials file

Loading from SQL Query

Loading data from an SQL query is similar to loading data from a View. The results of the query are saved in an intermediate, temporary table, from which the data is loaded.

Live Querying

BigQuery connector cannot be utilized for Live datasets since Tellius runs SQL queries in real-time. After the queries are processed, the results need to be written to a separate table, from which they will be read.

As a result, several temporary tables are created---which degrades the performance of live querying functionality and increases the overall cost to the user.

Solution

The following are the two powerful alternatives for using BigQuery connector with live datasets.

JDBC
  • Use a JDBC driver to process the query and stream the results into Tellius.
  • It requires only minimal effort to integrate a JDBC driver into Tellius live querying.
  • However, Google does not provide an official JDBC driver for BigQuery. There are a few JDBC connections available (as mentioned below), but their licensing is not straightforward and requires explicit approval.

https://www.magnitude.com/drivers/bigquery-odbc-jdbc

https://www.cdata.com/drivers/bigquery/jdbc/

BigQuery Java 

BigQuery supports an official Java connector which can be used to run queries on BigQuery and read the results into Tellius. 

https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-java

Pre-requisites for configuring BigQuery in Google Cloud

Service Account Creation

  1. In the Google Cloud console, click on IAM and Admin section on the left pane.
  2. Choose Service accounts and select the required project.
  3. Click on + Create service account on the top pane.
  4. Enter the name, ID, and description of the service account.
  5. Copy the service account email address (which is of the format <account_name>@<project_name>.iam.gserviceaccount.com)

  1. Click on Done to create the service account.

Project Level Permissions

The following roles need to be included at the project level:

  • BigQuery Read Session User

It is required to read the data in parallel from Google Storage

  • BigQuery Job User

It is required to load data from Views or execute live-mode SQL queries directly from BigQuery. It is also required to run jobs where the Views will be materialized to a temporary table by triggering a job and then reading from Google Storage.

  1. Select the required project.
  2. Under Actions, click on Manage permissions.
  3. Click on Grant access.
  4. Under Add principals, paste the service account email copied from the previous section.
  5. Under Assign roles, choose the roles: BigQuery Read Session User and BigQuery Job User.
  6. Click on Save.
  7. Policy updated toast will be displayed.

Dataset Level Permissions

  • The role BigQuery Data Viewer should be assigned to the service account of the Tellius dataset from where the data would be read.
  • If Views are read into Tellius, then the role BigQuery Data Editor needs to be assigned to the service account of the same dataset or a new dataset created solely for Tellius to create temporary tables.
  1. Click on the dataset and select Options → Share.
  2. Under Add principals, paste the service account email address copied from the first section.
  3. Under Assign roles, provide the required role.
  4. Click on Save.

Did we help you?

Google BigQuery Connector

Contact