BT
Newer rss

The Evolving Panorama of Data

Posted by Rebecca Parsons  on  Jan 17, 2013 1

Rebecca Parsons proposes taking a different look at data, using different approaches and tools, then looks at some of the ways social data is used these days.

Scaling Scalability: Evolving Twitter Analytics

Posted by Dmitriy Ryaboy  on  Jan 13, 2013

Dmitriy Ryaboy shares some of the lessons learned scaling Twitter’s analytics infrastructure: Data loves a schema, Make data sources discoverable, and Make costs visible.

Big Data, Small Computers

Posted by Cliff Click  on  Dec 20, 2012

Cliff Click discusses RAIN, H2O, JMM, Parallel Computation, Fork/Joins in the context of performing big data analysis on tons of commodity hardware.

View Server: Delivering Real-Time Analytics for Customer Service

Posted by Richard Tibbetts  on  Sep 28, 2012

Richard Tibbetts presents a three-tier architecture for real-time data staging analysis, storing the results and delivering them to clients as a service accessible through a variety of interfaces.

NetApp Case Study

Posted by Kumar Palaniapan and Scott Fleming  on  Jun 01, 2012 1

Kumar Palaniapan and Scott Fleming present how NetApp deals with big data using Hadoop, HBase, Flume, and Solr, collecting and analyzing TBs of log data with Think Big Analytics.

Data, Be Like Water

Posted by Paul Sanford  on  May 24, 2012

Paul Sanford presents the transformations supported by data throughout its life cycle, and how that can be better done with Splunk, an engine for monitoring and analyzing machine-generated data.

Machine Learning on Big Data for Personalized Internet Advertising

Posted by Michael Recce  on  May 18, 2012 1

Michael Recce discusses how advertising works and what algorithms Quantcast uses to analyze large amounts of data in order to find out what people are interested in.

Grid Gain vs. Hadoop. Why Elephants Can't Fly

Posted by Dmitriy Setrakyan  on  May 16, 2012 3

Dmitriy Setrakyan introduces GridGain, comparing it and outlining the cases where it is a better fit than Hadoop, accompanied by a live demo showing how to set up a GridGain job.

Distributed Data Analysis with Hadoop and R

Posted by Jonathan Seidman and Ramesh Venkataramaiah  on  Mar 09, 2012 2

Jonathan Seidman and Ramesh Venkataramaiah present how they run R on Hadoop in order to perform distributed analysis on large data sets, including some alternatives to their solution.

NoSQL at Twitter

Posted by Kevin Weil  on  Dec 23, 2010 6

Kevin Weil presents how Twitter does data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB.

Machine Learning: A Love Story

Posted by Hilary Mason  on  Nov 09, 2010 16

Hilary Mason presents the history of machine learning covering some of the most significant developments taking place over the last two decades, especially the fundamental math and algorithmic tools employed. She also exemplifies how machine learning is used by bit.ly to discover various statistical information about users.

Facebook’s Petabyte Scale Data Warehouse using Hive and Hadoop

Posted by Ashish Thusoo and Namit Jain  on  Feb 21, 2010 6

Ashish Thusoo and Namit Jain explain how Facebook manages to deal with 12 TB of compressed new data everyday with Hive’s help. Hive is an open source data warehousing framework built on Hadoop, allowing developers to perform analysis against large datasets using SQL.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT