InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

DistributedLog at Twitter for High Performance Logging

Twitter is using replicated logs for high performance data collection and analysis of its systems. DistributedLog is the system developed at Twitter for this purpose. Twitter has developed a distributed key-value database, Manhattan. Manhattan can trade consistency for latency in reads following the eventually consistent data model. We examine Twitter's design and tradeoffs for DistributedLog.

Alex Giamas
on Oct 20, 2015
AI, ML & Data Engineering

Amazon Announces QuickSight - Business Intelligence for Big Data on AWS

Amazon has announced QuickSight at AWS Re:invent conference. QuickSight a complete Business Intelligence solution to help customers gain insights from the data they have stored in AWS.

Matt Kapilevich
on Oct 09, 2015
Cloud

Salesforce Enters IoT Market

At Salesforce’s recent Dreamforce conference, the company announced an upcoming IoT platform that will allow for the ingestion of real time data and turn it into actionable tasks across its suite of cloud based services.

Kent Weare
on Oct 01, 2015
Architecture & Design

Hortonworks Addresses the IoAT with DataFlow Based on NiFi

Hortonworks has quietly made available the DataFlow platform which is based on Apache NiFi and attempts to solve the processing needs of the IoAT.

Abel Avram
on Sep 25, 2015
AI, ML & Data Engineering

SpringXD being Re-architected and Re-branded to Spring Cloud Data Flow

Pivotal announced a complete re-design of Spring XD, its big data offering, during last week’s SpringOne2GX conference, with a corresponding re-brand from Spring XD to Spring Cloud Data Flow. The new product is focussed on orchestration.

Charles Humble
on Sep 25, 2015
AI, ML & Data Engineering

Splunk for DBAs

The DBA’s primary job is to ensure that the business’s information is always available, with performance coming in at close second. We’ve already talked about optimizing distributed queries in Splunk and map-reduce queries in Hunk. In this report we expand upon that with more information that a DBA needs to know about Splunk databases.

Jonathan Allen
on Sep 24, 2015
AI, ML & Data Engineering

Optimizing Distributed Queries in Splunk

Optimizing queries in Splunk’s Search Processing Language is similar to optimizing queries in SQL. The two core tenants are the same: Change the physics and reduce the amount of work done. Added to that are two precepts that apply to any distributed query.

Jonathan Allen
on Sep 23, 2015
Architecture & Design

Big Data Architecture: Push, Pull, or Search in Place?

A surprisingly common theme at the Splunk Conference is the architectural question, “Should I push, pull, or search in place?”

Jonathan Allen
on Sep 23, 2015
AI, ML & Data Engineering

Architecture, Tuning, and Troubleshooting a Splunk Indexer Cluster

If you could handle all of the data you need to work with on one machine, then there is no reason to use big data techniques. So clustering is pretty much assumed for any installation larger than a basic proof of concept. In Splunk Enterprise, the most common type of cluster you’ll be dealing with is the Indexer Cluster.

Jonathan Allen
on Sep 23, 2015
AI, ML & Data Engineering

Hunk/Hadoop: Performance Best Practices

When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.

Jonathan Allen
on Sep 23, 2015
DevOps

Introducing Splunk IT Service Intelligence

Splunk is jumping into the service-monitoring sector with a new visualization called IT Service Intelligence.

Jonathan Allen
on Sep 22, 2015
AI, ML & Data Engineering

Using Hunk+Hadoop as a Backend for Splunk

Splunk can now store archived indexes on Hadoop. At the cost of performance, this offers a 75% reduction in storage costs without losing the ability to search the data. And with the new adapters, Hadoop tools such as Hive and Pig can process the Splunk-formatted data.

Jonathan Allen
on Sep 22, 2015
AI, ML & Data Engineering

Splunk .conf 2015 Keynote

Splunk opened their big data conference with an emphasis on “making machine data accessible, usable, and valuable to everyone”. This is a shift from their original focus: indexing arbitrary big data sources. Reasonably happy with their ability to process data, they want to ensure that developers, IT staff, and normal people have a way to actually use all of the data their company is collecting.

Jonathan Allen
on Sep 22, 2015
Cloud

Google's Cloud Dataflow Enters General Availability

On August 12, Google announced that its big data processing service has reached general availability. This managed service allows customers to build pipelines that manipulate data prior to being processed by big data solutions. Cloud Dataflow supports both streaming and batch programming in a unified model.

Kent Weare
on Sep 17, 2015
AI, ML & Data Engineering

Data Workflow Management Using Airbnb's Airflow

Airbnb recently opensourced Airflow, its own data workflow management framework. Airflow is being used internally at Airbnb to build, monitor and adjust data pipelines. Airflow’s creator, Maxime Beauchemin and Agari’s Data Architect and one of the framework’s early adopters Siddharth Anand discuss about Airflow, where it can be of use and future plans.

Alex Giamas
on Sep 08, 2015

Newer News

Older News

InfoQ Software Architects' Newsletter

News