InfoQ Homepage Big Data Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Big Data Architecture: Push, Pull, or Search in Place?

A surprisingly common theme at the Splunk Conference is the architectural question, “Should I push, pull, or search in place?”

Jonathan Allen
on Sep 23, 2015
AI, ML & Data Engineering

Architecture, Tuning, and Troubleshooting a Splunk Indexer Cluster

If you could handle all of the data you need to work with on one machine, then there is no reason to use big data techniques. So clustering is pretty much assumed for any installation larger than a basic proof of concept. In Splunk Enterprise, the most common type of cluster you’ll be dealing with is the Indexer Cluster.

Jonathan Allen
on Sep 23, 2015
AI, ML & Data Engineering

Hunk/Hadoop: Performance Best Practices

When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.

Jonathan Allen
on Sep 23, 2015
DevOps

Introducing Splunk IT Service Intelligence

Splunk is jumping into the service-monitoring sector with a new visualization called IT Service Intelligence.

Jonathan Allen
on Sep 22, 2015
AI, ML & Data Engineering

Using Hunk+Hadoop as a Backend for Splunk

Splunk can now store archived indexes on Hadoop. At the cost of performance, this offers a 75% reduction in storage costs without losing the ability to search the data. And with the new adapters, Hadoop tools such as Hive and Pig can process the Splunk-formatted data.

Jonathan Allen
on Sep 22, 2015
AI, ML & Data Engineering

Splunk .conf 2015 Keynote

Splunk opened their big data conference with an emphasis on “making machine data accessible, usable, and valuable to everyone”. This is a shift from their original focus: indexing arbitrary big data sources. Reasonably happy with their ability to process data, they want to ensure that developers, IT staff, and normal people have a way to actually use all of the data their company is collecting.

Jonathan Allen
on Sep 22, 2015
Cloud

Google's Cloud Dataflow Enters General Availability

On August 12, Google announced that its big data processing service has reached general availability. This managed service allows customers to build pipelines that manipulate data prior to being processed by big data solutions. Cloud Dataflow supports both streaming and batch programming in a unified model.

Kent Weare
on Sep 17, 2015
AI, ML & Data Engineering

Data Workflow Management Using Airbnb's Airflow

Airbnb recently opensourced Airflow, its own data workflow management framework. Airflow is being used internally at Airbnb to build, monitor and adjust data pipelines. Airflow’s creator, Maxime Beauchemin and Agari’s Data Architect and one of the framework’s early adopters Siddharth Anand discuss about Airflow, where it can be of use and future plans.

Alex Giamas
on Sep 08, 2015
Cloud

Microsoft Releases Azure Data Factory

Any cloud provider that believes in data gravity is trying to make it easier to collect and store data in its facilities. To make data movement between cloud and on-premises endpoints easier, Microsoft recently announced the general availability of Azure Data Factory (ADF).

Richard Seroter
on Aug 25, 2015
Data Quality at Prezi

For an organization to be data-driven, it's not enough to just dump mountains of data. That data needs to be accurate and meaningful. Julianna Göbölös-Szabó, data engineer at Prezi shared how they improved the quality of its log data. Their solution involved moving from unstructured to structured data with a lightweight, contract-based approach to nudge all teams in the right direction.

João Miranda
on Jul 18, 2015
Basho Data Platform Supports In-Memory Analytics, Caching, Search and Integration with NoSQL

Basho Data Platform supports integration with NoSQL databases like Redis, in-memory analytics, caching, and search. Basho Technologies, the company behind Riak NoSQL database, announced in May, the availability of the data platform that can be used to deploy and manage Big Data, IoT and hybrid cloud applications.

Srini Penchikala
on Jul 05, 2015
Leveraging Data Science to Improve Monitoring

At the recent devopsdays Amsterdam 2015, Patrick Roelke contended that monitoring still has lots of issues. Roelke believes that data science can help by eliminating static thresholds and coalescing information from various data sources into a single metric. The talk included a quick overview of monitoring tools that leverage data science: Kale, Bosun and AnomalyDetection.

João Miranda
on Jun 30, 2015
Software Defined Data Mart In The Enterprise Using Metanautix Quest

Metanautix recently announced the newest version of its product, Quest. Quest allows enterprises to build software defined data marts that can run in virtualized servers. Designed from the ground up with security and auditability in mind, Quest can deal with Big Data workloads and encapsulate it into different logical views, ready for consumption by different departments in enterprise.

Alex Giamas
on Jun 29, 2015
Developments in IT Project Management

The demand for IT project managers is increasing. Agile methodologies support collaboration with distributed teams for creative problem solving. The Internet of Things, cloud, big data, and cyber security will continue to dominate the IT landscape. Project managers have to pioneer IOT initiatives, be prepared for the influx of data and ensure that deliverables from their projects are secure.

Ben Linders
on Jun 25, 2015
Twitter Has Replaced Storm with Heron

Twitter has replaced Storm with Heron which provides up to 14 times more throughput and up to 10 times less latency on a word count topology, and helped them reduce the needed hardware to a third.

Abel Avram
on Jun 12, 2015

Newer News

Older News

InfoQ Software Architects' Newsletter

News