This past week, Google announced some exciting new changes & feature enhancements to BigQuery that have captured my attention. The changes include significant enhancements to BigQuery’s streaming capabilities, Google’s Compute Engine, and AppEngine code management. It’s all pretty exciting news.
For those of you who are unfamiliar with Google’s BigQuery, it’s a cloud database that allows developers and analysts to run super-fast, SQL-like queries against terabytes of data in seconds. I love using it but have experienced challenges in the past because of the lack of real-time data streaming capabilities.
I was originally schooled in working with Amazon’s Big Data Platform, specifically Amazon’s Kinesis. Amazon Kinesis is a fully managed service that supports the real time processing of streaming data. It can collect and process hundreds of terabytes of data per hour from hundreds of thousands of sources. As a data scientist, being able to harness this type of power opened the door to so many potential applications, including real-time streaming analysis of website data, marketing and financial information, logging & graphing, and even metering.
I’ll always be a Kinesis fan but as a developer actively building on Google’s cloud infrastructure, it’s great to hear about the addition of real streaming power to BigQuery. Recent announcements indicate BigQuery’s capacity will grow from 1,000 rows per record, per table to up to 100,000 full records per second. A monumental change that moves Google customers closer to true real-time business decision making. It also has the potential to unlock concepts such as stream mining, an area of advanced data mining that should continue to heat up in the next decade.
In addition to adding a number of new features to BigQuery, Google has announced that they are enhancing their cloud offering in complementary areas that will also benefit customers looking to stream large chunks of data. For example, Google’s Compute Engine team recently announced that they will be adding a number of new virtual machine options including Redhat Enterprise and Windows Server 2008. They have also announced other functionality changes including changes to its developer dashboard (such as automated unit testing), a managed VM service for AppEngine, and many others.
These changes will give many IT managers who need this type of capability – and who operate primarily on AWS – a cause for pause. In the meantime, I can’t wait to get my hands dirty pushing the limits of the Google stack as these changes are rolled into production.
Nathaniel Payne is a senior data architect with Cardinal Path and a qualified Google Compute Engine & BigQuery developer.