This past year has seen a major evolution in the features and functionality of Google’s BigQuery, and last week, Google released some new and exciting capabilities (summarized below).
Recently, our data science team at Cardinal Path has seen a lot of of interest in Google BigQuery as well as many of Google’s various Cloud products. While much of this interest has been driven by an increasing adoption on the developer side, a big motivation comes from business users who are starting to understand the benefits that BigQuery brings to their organizations including the potential for significant cost savings.
As an integrated application that can receive, process, and send data to many enterprise applications at a reduced cost when compared to many other similar database tools it’s easy to see why BigQuery is gaining traction. As an aggregator, BigQuery can provide a single view of the customer, and its processing engine enables deep data mining.
With this latest release, Google added several features to BigQuery that build on its strengths as a data warehouse for large-scale data analytics.
Some highlights from this release:
- Ability to query CSV and JSON data directly from Google Cloud Storage. That means you can save the step of importing data into BigQuery and simply query data held within Cloud Storage without importing. In addition, Google has also added the ability to cancel a job.
- Deprecation of the EACH keyword in JOIN and GROUP BY clauses. I’m always reiterating the importance of remembering EACH when attempting to do a join that is perhaps imbalanced or needs to group data. Many organizations require their developers to write JOINS across many datasets in order to answer critical business questions.With this release, BigQuery will automatically optimize the join and grouping strategies in queries for best performance, saving you from worrying about the operations on the back end and instead focusing on writing clean, crisp queries.
- BigQuery Slots. Developers or administrators can guarantee resources, regardless of demand on the overall multi-tenant pool and provides important stability in resource acquisition. This is particularly important during peak times when critical insights are needed for decision making (eg. holiday season for a major retailer doing hourly or per minute reporting).
- Query pricing tiers. With differential pricing per terabyte, organizations can purchase high-performance computing resources for more intensive queries, and budget more efficiently for their unique needs.
- Concurrent rate limit has been increased to 50 queries, and the daily query limit to 100,000 queries. The higher limits support the addition of streaming buffer statistics for tables that are being actively modified by streaming inserts. These tables and statistics offer the user better information about table size and ability to copy/extract, and help with debugging– a critical need when building production level applications. For organizations that are starting to use BigQuery as their centralized data warehouse, more queries will help answer some of the most challenging business questions.
Overall, these are some positive additions that will help organizations leverage Google BigQuery as an integral part of their analytics success.