x Take the InfoQ Survey !

MongoDB Hits 3.2 and Becomes Enterprise Ready

by Alex Giamas on  Nov 25, 2015

MongoDB recently announced the newest version of its NoSQL database synonymous product. Building upon the new features introduced in 3.0 release, 3.2 is expanding and solidifying MongoDB’s interest towards the corporate world.

IBM Commits to Advance Apache Spark

by Alex Giamas on  Nov 20, 2015

Earlier last month in Las Vegas, at IBM Insight 2015, IBM announced a major commitment to the Apache Spark project. Referring to it as “potentially the most significant open source project of the next decade” tells a lot about how important IBM believes Apache Spark is. With IDC reporting that 80% of cloud applications in the future will be data intensive, Apache Spark can unlock previously...

DMTK, a Machine Learning Toolkit from Microsoft

by Abel Avram on  Nov 13, 2015

About the same time Google announced open sourcing TensorFlow, Microsoft has pushed to GitHub DMTK, a Distributed Machine Learning Toolkit. While Google has released a one-machine version of TensorFlow, DMTK runs on a cluster of machines.

TensorFlow: Google Open Sources Their Machine Learning Tool

by Abel Avram on  Nov 09, 2015

TensorFlow is a machine learning library created by the Brain Team researchers at Google and now open sourced under the Apache License 2.0. TensorFlow is detailed in the whitepaper TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. The source code can be found on Google Git.

Teradata Announces New Software for Real-Time Analysis of Internet of Things Data

by Kevin Farnham on  Nov 06, 2015

At its 2015 Partners User Group Conference, Teradata announced two new software capabilities for real-time ingestion and analysis of massive streams of IoT data. While the Teradata Listener software enables "listening" to multiple, diverse IoT data streams in real time, the new Teradata Aster Analytics on Hadoop software provides scalable analysis of massive IoT data streams.

DistributedLog at Twitter for High Performance Logging

by Alex Giamas on  Oct 20, 2015

Twitter is using replicated logs for high performance data collection and analysis of its systems. DistributedLog is the system developed at Twitter for this purpose. Twitter has developed a distributed key-value database, Manhattan. Manhattan can trade consistency for latency in reads following the eventually consistent data model. We examine Twitter's design and tradeoffs for DistributedLog.

Amazon Announces QuickSight - Business Intelligence for Big Data on AWS

by Matt Kapilevich on  Oct 09, 2015

Amazon has announced QuickSight at AWS Re:invent conference. QuickSight a complete Business Intelligence solution to help customers gain insights from the data they have stored in AWS.

Salesforce Enters IoT Market

by Kent Weare on  Oct 01, 2015

At Salesforce’s recent Dreamforce conference, the company announced an upcoming IoT platform that will allow for the ingestion of real time data and turn it into actionable tasks across its suite of cloud based services.

Hortonworks Addresses the IoAT with DataFlow Based on NiFi

by Abel Avram on  Sep 25, 2015

Hortonworks has quietly made available the DataFlow platform which is based on Apache NiFi and attempts to solve the processing needs of the IoAT.

SpringXD being Re-architected and Re-branded to Spring Cloud Data Flow

by Charles Humble on  Sep 25, 2015

Pivotal announced a complete re-design of Spring XD, its big data offering, during last week’s SpringOne2GX conference, with a corresponding re-brand from Spring XD to Spring Cloud Data Flow. The new product is focussed on orchestration.

Splunk for DBAs

by Jonathan Allen on  Sep 24, 2015

The DBA’s primary job is to ensure that the business’s information is always available, with performance coming in at close second. We’ve already talked about optimizing distributed queries in Splunk and map-reduce queries in Hunk. In this report we expand upon that with more information that a DBA needs to know about Splunk databases.

Optimizing Distributed Queries in Splunk

by Jonathan Allen on  Sep 23, 2015

Optimizing queries in Splunk’s Search Processing Language is similar to optimizing queries in SQL. The two core tenants are the same: Change the physics and reduce the amount of work done. Added to that are two precepts that apply to any distributed query.

Big Data Architecture: Push, Pull, or Search in Place?

by Jonathan Allen on  Sep 23, 2015

A surprisingly common theme at the Splunk Conference is the architectural question, “Should I push, pull, or search in place?”

Architecture, Tuning, and Troubleshooting a Splunk Indexer Cluster

by Jonathan Allen on  Sep 23, 2015

If you could handle all of the data you need to work with on one machine, then there is no reason to use big data techniques. So clustering is pretty much assumed for any installation larger than a basic proof of concept. In Splunk Enterprise, the most common type of cluster you’ll be dealing with is the Indexer Cluster.

Hunk/Hadoop: Performance Best Practices

by Jonathan Allen on  Sep 23, 2015

When working with Hadoop, with or without Hunk, there are a number of ways you can accidentally kill performance. While some of the fixes require more hardware, sometimes the problems can be solved simply by changing the way you name your files.

General Feedback
Marketing and all content copyright © 2006-2015 C4Media Inc. hosted at Contegix, the best ISP we've ever worked with.
Privacy policy