At the opening keynote for Splunk .conf2014 we heard about GE Capital’s developer culture, Red Hat’s internal IT focus, and Coca-Cola’s “Data Lake” theory of information management. Here are raw notes from the floor.
For those of you who don’t know what Splunk is, think back to Microsoft’s ODBC in the late 90’s. Back then, virtually any data source from simple CSV files to high-end mainframes could be examined and queried using an ODBC-compatible driver. Fast forward to today and ODBC is no longer a panacea. While still useful for tabular data, it is inappropriate for pulling data from REST APIs, NoSQL databases, and event/message based systems. Splunk sees itself as filling this role as the universal data aggregator.
A major emphasis for Splunk is the concept of “schema on read”. Rather than using predetermined schema, they emphasize the ability to define schema for interpreting a data source at the same time you create the query for reading the data source.
Major Components
The Splunk engine is the platform that their other offerings are built upon. This provides the universal data aggregator that I spoke of in the intro. On top of that is four major components that we’ll talk about later in depth.
- Splunk>Enterprise
- Splunk>Cloud
- Hunk
- Spunk>Mint
Areas of investment
- Analytics for Everyone: Reducing or eliminating the need for software developers to write reports.
- Reducing TCO: Specifically data storage and processing costs.
- Splunk Anywhere: The ability to run Splunk on both on premise and in the cloud.
- Solutions
- Splunk App Development: Create a developer platform support team with an emphasis on best practices, certifications, and application testing.
Snehal Anatani: GE Capital
The major shift GE Capital was a shift towards three goals:
- Developer Velocity: How quickly can new features be delivered
- Failed Customer Interactions: Rather than looking at raw server downtime, look at how that downtime affected real users.
- Compliance Response Time: How quickly can information requests from compliance officers be honored.
In order to achieve these goals they focused on a few key changes in their IT culture:
- No Squirrels: Avoid technology debates (i.e. Puppet vs Chief, WebLogic vs WebSphere) that distract from the bigger questions about what customer features need to be created.
- Rather than PDFs and PowerPoints, design patterns are expressed in terms of runnable templates.
- Continuous Delivery: Over 3,500 automated deploys per month. Mistakes, both in requirements and development, are caught early.
- Transparency: Don’t allow project managers to hide problems behind PowerPoints and PDFs. Progress is reported using demonstrations, not status reports.
- Every design pattern is pre-instrumented so that QA and IT Operations can see what’s happening in the application from day one.
Lee Congdon, Red Hat
Red Hat’s middle ware business is growing three times faster than its Linux business, and their cloud business is growing even faster. In order to support this, 18 months ago Red Hat Information Technology Objectives established three goals:
- Exploit the cloud (using Red Hat cloud products)
- Make Red Hat data driven
- Make associates more productive
Mr. Congdon claims that Splunk provided these capabilities to them.
- Provides an integrated views of security-related events across multiple systems. This allowed for proactive measures, not just reactive, after-the-fact, research.
- Capture and manage system status inside and outside their data centers.
- Presents the performance of redhat.com in real time. This allowed them to uncover and diagnose a header corruption error on their website in a matter of hours. In the past this could take them weeks.
Long term Red Hat wants to invest into predictive analytics. He says that Splunk will be one of many tools they are looking at for this next step.
Red Hat is working with Splunk to develop a new set of tools for working with ‘cold data’. Cold data refers to archived data that is rarely needed and is stored in bulk storage rather than traditional, easily queryable, databases.
Michael Connor, Coca-Cola
Most companies don’t use all of their data; even using 30% of the available data is remarkable. In a typical project, up to 90% of the cost is merely getting information from the isolated “data cartels” within their organization.
Without data, you are just another person with an opinion.
In order to make informed decisions, data must be readily accessible. Rather than a series of data warehouses, companies should be focused on “data lakes”. A data lake is a deep pool of unstructured data, both internal and external, that can be accessed immediately.
Coca-Cola using Amazon Web Services (AWS) to provide auto-scaling. Python scripts are used to manage these machines. Data from these servers are automatically piped to their data lake, which is implemented as a series of Splunk servers running in separate AWS availability zone. At Coca-Cola it is impossible to stand up a production server that isn’t connected to their data lake.
They built a security dashboard consisting of 15 panels in about 3 hours. This was done using a combination of custom components and third-party Splunk add-ons.
Design recommendation: Creating visually compelling information displays get people’s attention. Even if it isn’t the most effective means of presenting the data, it engages people and encourages them to look at the other representations.
Every time a Coca-Cola Freestyle soda machine is used, a hundred bytes representing the customer’s formal is sent to Splunk for aggregation.
Gerald Kanapathy, Splunk
Splunk 6.2 will be released this month.
Mission Critical Enterprise: Splunk is currently being used to process tens of terabytes of data each day, but it is hard to get that setup. To bring this under control, a new distributed management tool so that an entire Splunk deployment can be monitored from one place.
In version 6.1, Splunk added multi-site clustering. Version 6.2 will introduce search-ahead clustering.
Performance has improved 2.5X since version 5 on the same hardware.
Divanny Lamas, Splunk
With Splunk’s emphasis on unstructured data, regular expressions are heavily used. Unfortunately, most end users don’t understand how to read regular expressions, let alone write them. This means developer assistance is needed to perform basic searches. Splunk’s new Field Extractor addresses this issue. It is a point-and-click interface for generating the regular expressions, which are then exposed via the search interface.
Instant Pivot isn’t just about pivot tables; it can be used to create a variety of visualizations.
Pre-built panels allow dashboard components to be published and shared with other users in the organization.
Event Pattern Detection. Splunk’s analytic tools now contain a pattern recognition tool. By pointing it at a set of data, it automatically analyzes the data and looks for patterns that the user may wish to explore.
Operation Intelligence Anywhere allows for embedding live reports into any web-based application. With the next version of Splunk, these live reports can be extended to mobile applications.
Hunk
Hunk is now available on Amazon Web Servers in a SaaS model.
Mainframe Support
- Universal Forwarder for z/Linux
- Syncsort Ironstream for z/OS
New feature set: Splunk App for Stream
Praveen Rangnath, Splunk in the Cloud
Splunk Cloud is their Amazon-based SaaS offering. Hosted on Amazon Web Services, it is billed by the hour. Data processing packages ranging for 5 GB/day to 5 TB/day are currently offered in the US and Canada.
Splunk Cloud is now offering a 100% SLA, if there is any downtime at all they will “owe you money”. They think they can offer this because they have:
- Multiple AWS Available Zones
- High Available Across Indexers and Search Heads
- Dedicated Cloud Environments
Splunk is using their own software to monitor Splunk Cloud’s performance and availability.
Clint Sharp, Splunk Operation Analytics
In the past monitoring was just a simple up/down display. These days most organizations need seven to ten monitor tools. These include application performance, hypervisor, data storage, and other systems. To bring this all together, Splunk’s new vision is something they call the “Common Information Model” or CIM.
Clint Sharp, Splunk Mint
Mint is Splunk’s mobile intelligence product. Their entry-level product, Mint Express, is currently handling 20K requests per second. Their new product, Mint Enterprise combines the capabilities of Splunk with Mint.
Haiyan Song, Splunk for Enterprise Security
Important statistics for security breaches:
- 100% of breached involved the use of valid credentials
- 229 days on average to detect a major breach
- 67% of breaches are reported by a customer or third-party.
If you want to know more about anything you read in InfoQ’s Splunk conference, contact our on-site reporter at jonathan@infoq.com.