Ron Bodkin on Big Data and Analytics
Ron Bodkin discusses big data architecture, real-time analytics, batch processing, map-reduce, and data science.
Ron Bodkin discusses big data architecture, real-time analytics, batch processing, map-reduce, and data science.
In a recent press news from 13th December, SAP announced at the SAP Influencer Summit in Boston that “leading software vendors are adopting the open SAP HANA platform for their existing products and building completely new applications.” Among them are companies such as T-Mobile and TIBCO.
Microsoft is making available a cloud service called Social Analytics for users interested in analyzing Twitter, Facebook, Blogger, YouTube, etc. in order to get insight on the trends on the social web.
In a recent news article the Massachusetts Institute of Technology has introduced a technology for automatically remembering connections between objects. The provided system determines how objects in a large software project interact, so it can inform latecomers which objects they will need to design certain types of functions.
Imagine ad hock data mining queries against a single table with 1 TB of data and 1.44 billion rows coming back in roughly a second. This is the scenario Microsoft intends to support using 32-core machines and their new column-based storage engine.
Kevin Weil presents how Twitter does data analysis using Scribe for logging, base analysis with Pig/Hadoop, and specialized data analysis with HBase, Cassandra, and FlockDB.
Hilary Mason presents the history of machine learning covering some of the most significant developments taking place over the last two decades, especially the fundamental math and algorithmic tools employed. She also exemplifies how machine learning is used by bit.ly to discover various statistical information about users.

Ashish Thusoo and Namit Jain explain how Facebook manages to deal with 12 TB of compressed new data everyday with Hive’s help. Hive is an open source data warehousing framework built on Hadoop, allowing developers to perform analysis against large datasets using SQL.

Ilya Grigorik discusses his company's PostRank algorithm for tracking reader engagement with content. Also: his experience scaling MySQL, Tokyo Cabinet, Ruby HTTP libs, Solr, Amazon EC2 and more.