Twitter is using replicated logs for high performance data collection and analysis of its systems. DistributedLog is the system developed at Twitter for this purpose. Twitter has developed a distributed key-value database, Manhattan. Manhattan can trade consistency for latency in reads following the eventually consistent data model. We examine Twitter's design and tradeoffs for DistributedLog.
Amazon has announced QuickSight at AWS Re:invent conference. QuickSight a complete Business Intelligence solution to help customers gain insights from the data they have stored in AWS.
At Salesforce’s recent Dreamforce conference, the company announced an upcoming IoT platform that will allow for the ingestion of real time data and turn it into actionable tasks across its suite of cloud based services.
At the inaugural HashiConf conference, held in Portland, USA, HashiCorp announced the release of a new distributed scheduler platform named ‘Nomad’ that is capable of scheduling containers, VMs and standalone applications; and a new application delivery tool named ‘Otto’ that builds upon the existing Vagrant tool by enabling the management of remote application deployments.
Force12.io have released a prototype ‘microscaling’ container demonstration running on the Apache Mesos cluster manager, which they claim starts and stops ‘priority 1’ and ‘priority 2’ containers more rapidly than traditional autoscaling approaches when given a simulated demand for the differing workloads. InfoQ discussed the goals and methodology of this approach with Force12.io’s Ross Fairbanks.
Based on their experience with arbitrarily shutting down servers or simulating the shutdown of an entire data center in production, Netflix has proposed a number of principles of chaos engineering.
Hortonworks has quietly made available the DataFlow platform which is based on Apache NiFi and attempts to solve the processing needs of the IoAT.
Pivotal announced a complete re-design of Spring XD, its big data offering, during last week’s SpringOne2GX conference, with a corresponding re-brand from Spring XD to Spring Cloud Data Flow. The new product is focussed on orchestration.
The DBA’s primary job is to ensure that the business’s information is always available, with performance coming in at close second. We’ve already talked about optimizing distributed queries in Splunk and map-reduce queries in Hunk. In this report we expand upon that with more information that a DBA needs to know about Splunk databases.
Optimizing queries in Splunk’s Search Processing Language is similar to optimizing queries in SQL. The two core tenants are the same: Change the physics and reduce the amount of work done. Added to that are two precepts that apply to any distributed query.
A surprisingly common theme at the Splunk Conference is the architectural question, “Should I push, pull, or search in place?”
If you could handle all of the data you need to work with on one machine, then there is no reason to use big data techniques. So clustering is pretty much assumed for any installation larger than a basic proof of concept. In Splunk Enterprise, the most common type of cluster you’ll be dealing with is the Indexer Cluster.
When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.
Splunk is jumping into the service-monitoring sector with a new visualization called IT Service Intelligence.