BT

Splunk Conference Recap: The Key to Big Data is Machine Learning

| by Jonathan Allen Follow 528 Followers on Oct 10, 2014. Estimated reading time: 3 minutes |

Splunk’s user conference has drawn to a close. After three days with over 160 sessions ranging from security and operations to business intelligence to even the Internet of Things, the same central theme kept appearing over and over again: the key to Big Data is machine learning.

Storage is no longer an issue. From specialized storage hardware running Hadoop compatible nodes to commodity hard drives clustered over hundreds of machines, there is no doubt that we have the ability to handle kind of storage problem. On the other side, analysis and visualization tools such as Splunk are well established. If you know what you are looking for, these tools can quickly get you the answers you need.

But what should you be looking for? For the vast majority of vendors on the floor, the answer to that question in machine learning. It doesn’t matter if you are talking about network traffic, user behavior, or consumer trends; the way to gain real insights into what you are monitoring is to find the patterns and correlations in the data. And while a human operator can stumble across these by trial and error, they believe that a computer can be trained to find them much faster and without bias.

Of course that isn’t to say the humans are obsolete. Someone has to verify the correlations are not just coincidence and figure out a way to act upon the information. And that’s where the aforementioned visualization tools come into play.

Primary Use Cases for Big Data and Machine Learning

While the potential for big data is nearly limitless, it is inevitable that one or two industries are going to lead the charge. Ask me again in a year and I may say something different, but for now prediction is that either security or operations will be in the forefront.

Every company larger than a cash-only coffee stand needs to think about information security. Even if they have no intellectual property to speak of, they all deal with sensitive information such as credit card numbers. Having ways to reliably detect and stop a breach while it is happening is critical to the long-term success of a company. Security products based on machine learning promise to provide this capability, with ease of use approaching turnkey levels.

In a similar vein, operations analysis is going to be popular. Right now you can buy tools that monitor your network, decode the packets, and show you exactly how a given REST call flows through your middle tier servers all the way to the database or file system, and then compare it to how it was behaving a week, month, or year ago. This isn’t a future concept, this is something you can by off-the-shelf today and have running within a week.

Other areas of research will continue, but not at such a rapid rate. Fraud detection is incredibly important, but most companies are going to rely on their financial institutions to design and implement the necessary controls. I don’t expect to see many commercial, off-the-shelf products in this area.

Business intelligence is another area that will see a lot of money spent on research. But the algorithms needed by Coke-a-Cola and Pepsi to determine the next popular flavor will look nothing like what Gm and Ford are using to predict how many vehicles to make of each size. So again, commercial products are probably going to be mostly limited to basic analytics and visualization for the time being.

Other Conference Thoughts

All in all Splunk put on a great conference. Everything was well organized and there were sessions for everyone from the complete beginner to the most advanced data-mining engineer. My only complaint is that the sessions weren’t recorded. With so much content, one is bound to miss an important session or two due to conflicts.

Even if you are not interested in Splunk itself, this is an important conference for anyone interested in Big Data, machine learning, and related topics.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Machine learning is powerful and useful, but lets not hype it by peter lin

I've spent 10+ years studying machine learning and figuring out ways to use these technologies. There's a couple of big obstacles, which aren't obvious to people who are new to machine learning.

The first one is what features do you choose to train the system? This is really an art and requires technical and subject matter expertise. Figuring out which features to use isn't static either. Since reality changes, the features used to train machine learning has to adapt over time. Machines are terrible at figuring out how to evolve, so you still need some expert to help it. Historically, this problem has made machine learning difficult to implement and manage over time.

The second major challenge is filtering out noise. Noise could be the result of bad data, missing data, or lies. Given big data is about datasets over 50TB, scrubbing the data isn't trivial or easy. To get good training results, you have to have a solid clean dataset. Without that, the results won't be useful and will probably stay in the R&D department.

The third major problem is finding people who know and understand machine learning technology. Historically, experts in this field are limited and keeping them is challenging. Many of the companies that started in the 80's and 90's with AI technologies know this issue first hand. Assuming a company finds someone to build the system, maintaining it is difficult and expensive. These types of systems aren't webpages. You can't just hand it off to a junior developer and say "ok, maintain this app now."

Those in the field of AI research still remember "the AI winter" when research funds dried up and businesses turned away. Hopefully this time people will learn not to over-hype machine learning.

Re: Machine learning is powerful and useful, but lets not hype it by Faisal Waris

Agree with Peter.

Machine learning is an analytical exercise as much as it is a programming exercise.

Many models may be built just to answer a specific question and then thrown away while others can live on in production systems for 'scoring' purposes. Even production models have to be adjusted and tweaked from new data to keep them relevant over time. An an analyst (or data scientist) you are always exploring answers to new questions - nothing stays the same.

Raw data is rarely suitable as-is for input into ML algorithms. Most often you have to heavily transform the data to get it to a point where ML can be applied. This process is an art in itself.

Re: Machine learning is powerful and useful, but lets not hype it by Anil Shafeeque

Agree on the data engineering part, unless we have the right data ML will be a failure. Regarding the Machine learning part, with the introduction ML service, like Azure ML, Machine learning is more democratized and easy to start with. It helps companies to operationalize ML very easily. I think it will only get improved. So with a strong data engineering and exploration skills with help of ML services we can do a fairly good job.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

3 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT