13 Aug Google adds big data services to cloud platform
Google announced the general availability of two big data products formerly in beta: Google Cloud Dataflow and Google Cloud Pub/Sub. The two tools complete Google’s plan to bring its entire suite of internal big data tools into general availability.
Cloud Data Flow is a Google service for streaming big data on Google Compute Engine and App Engine without incurring the operational overhead of managing a large server cluster. Cloud Pub/Sub integrates applications and services with real-time analysis of data streams.
The two products join Google’s existing BigQuery SQL-query based system for analyzing large data streams and data sets.
Adding Cloud Data Flow and Cloud Pub/Sub puts Google on a more equal footing with Amazon Web Services, which has proven light on its feet when it comes to introducing new cloud services. Google Cloud Data Flow has a rough counterpart in Amazon’s existing Data Pipeline, Google Cloud Pub/Sub with Amazon Kinesis, and Google BigQuery with Amazon DynamoDB. Amazon also has a Hadoop-type service with Elastic MapReduce.
Google’s announcement said the two new services are based on a decade of investment in data handling, including MapReduce for simple data processing on large clusters, FlumeJava’s parallel data pipelines, and Millwheel’s fault-tolerant, large-data-stream processing.