Google is offering a Java SDK to integrate with the Google Cloud Dataflow managed service for analyzing live streaming data as part of its effort to broaden support for the platform.
By sharing via open source, the SDK provides a basis for adapting Dataflow to other languages and execution environments, said Sam McVeety, Google software engineer, in a recent bulletin. “We’ve learned a lot about how to turn data into intelligence as the original FlumeJava programming models (basis for Cloud Dataflow) have continued to evolve internally at Google.”
Google hopes to expand the Dataflow service as well as spur innovation in combining stream and batch processing models. “As the proliferation of data grows, so do programming languages and patterns,” said McVeety. “We are currently building a Python 3 version of the SDK to give developers even more choice and to make dataflow accessible to more applications. Reusable programming patterns are a key enabler of developer efficiency. The Cloud Dataflow SDK introduces a unified model for batch and stream data processing.”
For other environments, McVeety said modern development, particularly in the cloud, is about heterogeneous service and composition. “As Storm, Spark, and the greater Hadoop family continue to mature, developers are challenged with bifurcated programming models. We hope to relieve developer fatigue and enable choice in deployment platforms by supporting execution and service portability.”
Google Cloud Dataflow was introduced in June as a step toward providing a managed service model for data processing. Still in an alpha stage of release and restricted to “whitelisted” users (newcomers must apply for access to the service), Cloud Dataflow is intended to make it easier to focus on analysis without having to fret over maintenance of underlying data piping and processing infrastructure. An InfoWorld analysis of Cloud Dataflow concluded it is probably not a Hadoop killer, but a way for Google Cloud users to enrich applications.