Optimizing queries in Splunk’s Search Processing Language is similar to optimizing queries in SQL. The two core tenants are the same: Change the physics and reduce the amount of work done. Added to that are two precepts that apply to any distributed query.
A surprisingly common theme at the Splunk Conference is the architectural question, “Should I push, pull, or search in place?”
If you could handle all of the data you need to work with on one machine, then there is no reason to use big data techniques. So clustering is pretty much assumed for any installation larger than a basic proof of concept. In Splunk Enterprise, the most common type of cluster you’ll be dealing with is the Indexer Cluster.
When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.
Splunk is jumping into the service-monitoring sector with a new visualization called IT Service Intelligence.
Splunk can now store archived indexes on Hadoop. At the cost of performance, this offers a 75% reduction in storage costs without losing the ability to search the data. And with the new adapters, Hadoop tools such as Hive and Pig can process the Splunk-formatted data.
Splunk opened their big data conference with an emphasis on “making machine data accessible, usable, and valuable to everyone”. This is a shift from their original focus: indexing arbitrary big data sources. Reasonably happy with their ability to process data, they want to ensure that developers, IT staff, and normal people have a way to actually use all of the data their company is collecting.
Symantec’s Thawte unit admits that flawed internal practices allowed multiple Google SSL certificates to be released in an unauthorized manner.
On August 12, Google announced that its big data processing service has reached general availability. This managed service allows customers to build pipelines that manipulate data prior to being processed by big data solutions. Cloud Dataflow supports both streaming and batch programming in a unified model.
IBM has announced LinuxONE, a Linux-only hardware portfolio which runs SUSE, Red Hat or Ubuntu distributions and adds support for different open-source tools such as Docker and Chef. This offering is targeted to both large enterprises and mid-size businesses.
Bazel, the build system that Google open sourced six months ago, has reached the first beta milestone as planned, adding support for several languages and technologies.
After an informative presentation by Armon Dadgar at QCon New York that explored security requirements within modern production systems, InfoQ sat down with Dadgar and asked questions about HashiCorp’s Vault, an open source tool for managing secrets at scale.
Oracle shocked the Java world this week by announcing the dismissal of some of their top Java evangelists including Cameron Purdy and Simon Ritter.
Airbnb recently opensourced Airflow, its own data workflow management framework. Airflow is being used internally at Airbnb to build, monitor and adjust data pipelines. Airflow’s creator, Maxime Beauchemin and Agari’s Data Architect and one of the framework’s early adopters Siddharth Anand discuss about Airflow, where it can be of use and future plans.
OpenBSD has long-lacked support to host virtual machines on the X86/X64 platforms. OpenBSD developer Mike Larkin seeks to change that through his new project to bring a native hypervisor to this operating system, giving it the ability to host virtual machines natively.