Big data gems in machine data

In recent years, solution providers have been exuberant about the promise of mining big data for insights. But the reality is that, as Bain & Company observes, big promises have been made about customer impact and value creation founded on substantial investments in technology and expertise.

Even with available open source solutions, substantial investments and resources are required for scale and to hire data scientists and engage professional services to make analytics work.

Still, more often than not, projects have failed to produce patterns and insights that are actionable or able to drive business meaningfully.

“Management teams frequently don’t see enough immediate financial impact to justify additional investments,” writes David Court, a director at McKinsey, in the McKinsey Quarterly.

“They need to shift priorities from small-scale exercises to focusing on critical business areas and driving the use of analytics across the organization.”

Assets in hand

For a start, organizations are already collecting massive amounts of useful machine-generated data from system logs and configurations, application programming interfaces, message queues, change events, the output of diagnostic commands, call detail records and sensor data from industrial systems and more.

These machine data – one of the fastest growing and complex areas of big data – contain critical insights that can potentially deliver value across IT and business.

Machine data has fuelled a digital universe growing 40% a year into the next decade, expanding to include the Internet of Things, according to IDC.

Most unstructured data is machine-generated. The variety, velocity, volume or variability of machine data means that a new approach is required to turn machine data into business value.

“Business intelligence has held so much promise over the years,” says Michael Connor, senior platform architect at The Coca-Cola Company.

“But the tools have really been limited by the lack of data that they can bring in. [And the reason we don’t use all of our data to make decisions on a daily basis] is that we have a lot of little data islands around our company. We have to create a data democracy, to free all these data to get the power out of them. We have to create a data lake.”

Coke’s vibrant lake

Unlike a data warehouse, the data lake is a rich repository of unstructured data from both internal systems and external partners.

“At the end of 2013, we [moved our entire] consumer space to Amazon Web Services and we set out to automate every single thing we did when we moved to the cloud,” Connor explains.

“As services come online in real time, we automatically install Splunk forwarders [for reliable data collection from multiple sources] and our security software. All data that comes through [the machine] begins to flow into our data lake, which is the Splunk Enterprise index system running in the Amazon Elastic Compute Cloud.”

Breaking down data silos and turning data islands into a data lake, Splunk helps organizations like Coca Cola to collect data from anywhere; search and analyze the data; and gain real-time operational intelligence. The platform can be applied in multiple use cases.

Like Google index and search, Splunk is schema-less, enabling organizations to ask any question of data from any source and visualize results with real-time visibility, alerts and dashboards.

With a wealth of data from loyalty programs, vending, social media and promotions, as well as fraud and security data flowing into the data lake, Connor’s team began to build dashboards to visualize the real-time data.

“One of our vending dashboards allows us to use drop-downs to compare two channels side-by-side, [such as offices and malls],” he says. “We began to see consumption patterns over channels. We noticed big spikes in vending in college campuses right before Walking Dead when people were having viewing parties. We’re getting to know our customers in ways that we never had before. It took us just three hours to build 15 panels on this dashboard.”

Yahoo’s Hunk for Hadoop

With Splunk, all machine data, such as application logs, web logs, network logs and mobile data are indexed once and that data set can be used to support the alerts and dashboards of different business units and use cases.

A great endorsement of Splunk’s capabilities in this area has been Yahoo’s implementation of the Hunk analytics platform to rapidly explore, analyze and visualize data in Hadoop and NoSQL data stores.

Yahoo, birthplace of the Hadoop framework for distributed processing of large data sets, stores more than 600 petabytes of data.

It also analyzes more than 150 terabytes of machine data per day, applying Splunk Enterprise to IT operations, applications delivery, security, business analytics and other use cases.

“Hunk gives us deep visibility into our massive Hadoop data stores to help us continuously optimize operational performance,” says Ian Flint, monitoring architect at Yahoo. “Insights we gain from Hunk help us save millions of dollars per year in hardware provisioning. Splunk Enterprise helps us to maximize revenue by giving product and business teams better insight into our customers, the user experience and any looming issues.”

Ubisoft’s service creed

Like Yahoo, many organizations are using Splunk Enterprise to analyze and visualize both real-time and historical data about customer preferences, user experience, click rates, performance and IT workflow issues, advertising and marketing campaign popularity, product feature usage, and more.

For example, Ubisoft, makers of games such as Assassin’s Creed and Watch Dogs, uses Splunk Enterprise to better understand usage patterns and optimize its online services to keep game developers and players happy.

“If our services go down, then all the online features of the games go down as well,” says Martin Lavoie, Technology Group deputy director at Ubisoft. “We use Splunk for management and alerting so if things go wrong, we can take action and fix the problem in a timely fashion to make sure our services are always available.”

For Lavoie, the ‘wow’ moment came when he saw the dashboard. “We were totally working in the blind before so when we could really see the light and see how all our services were being used, we could start optimizing those services,” he says. “We couldn’t live without it now.”

More importantly, Splunk Enterprise allows organizations to start small, mining machine data for insights at minimal cost or by trying out a free download. Its ease of use allows organizations to get started without the high cost of a data analytics team and scale from megabytes to terabytes of data.

This makes big data accessible, usable and valuable to businesses of all sizes, spanning non-profit organizations with limited budgets to large enterprises.

“Big data is overwhelming and the number of vendors in the landscape are increasing,” says Connor. “For me, Splunk has been most powerful as a one-stop shop. I haven’t seen another technology today that can collect, forward, index, analyze, visualize, dashboard an event and report all in a single place with a really beautiful intuitive dashboard.”

This reinforces Court’s belief that automating processes and decision-making is becoming much easier.

“Technology improvements are allowing a much broader capture of real-time data – for example, through sensors – while facilitating real-time, large-scale data processing and analysis. These advances are opening new pathways to automation and machine learning that were previously available only to leading technology firms.”

By making adoption easy across data sources, use cases and consumption models, Splunk has truly created big data democracy that influences business and creates value without requiring big budgets and costly expertise.

This is a QuestexAsia feature commissioned by Splunk.