InfoQ

InfoQ

Editor Specific Content View

Ron Bodkin

Ron founded Think Big Analytics to help customers leverage new data processing technologies like Hadoop, NoSQL databases and R for statistical analysis. Ron works with customers to develop solutions that leverage unstructured data and new techniques. Previously Ron was the VP of Engineering for Quantcast. Each day Quantcast uses map-reduce to load 10 billion events and produce more than a petabyte of data for production reporting, ad hoc analysis, data mining and machine learning. Prior to that Ron was a founder of enterprise consulting companies C-bridge and New Aspects.

All of Ron Bodkin's Content on InfoQ


Latest featured content by Ron Bodkin

Hadoop and NoSQL in a Big Data Environment

Topics
NoSQL,
Data Access,
Design Pattern,
Agile,
Big Data,
Database Design,
Performance & Scalability,
Data Warehousing

Ron Bodkin of Big Data Analytics discusses early adoption of Hadoop, NoSQL and big data technologies. He discusses common patterns and explains how developers can write low-level primitives to optimize MapReduce function. Other topics include Hive, Pig, multi tenancy, and security.

Ron Bodkin on Big Data and Analytics

Topics
Map-Reduce,
Machine Learning,
Operations,
Big Data,
Architecture

Ron Bodkin discusses big data architecture, real-time analytics, batch processing, map-reduce, and data science.

Large Scale Map-Reduce Data Processing at Quantcast

Topics
Architecture,
Big Data

Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.

News by Ron Bodkin

eBay readies next generation search built with Hadoop and HBase

Topics
Search,
NoSQL,
Big Data

eBay presented a keynote at Hadoop World, describing the architecture of its completely rebuilt search engine, Cassini, slated to go live in 2012. It indexes all the content and user metadata to produce better rankings and refreshes indexes hourly. It is built using Hadoop for hourly index updates and HBase to provide random access to item information.

CassandraSF2011: Progress and Futures

Topics
NoSQL,
Java,
Big Data,
Apache,
Architecture

Johnathan Ellis keynoted at Cassandra SF 2011. Ellis reviewed accomplishments including better support for multi-data center deployments, optimized read performance, included integrated caching and improved client APIs including a SQL-like language CQL. Looking forward, Ellis emphasized polish - efficient database repair, storage compression, optimized performance and an expanded CQL language.

Cassandra Indexing Guidelines from CassandraSF2011

Topics
NoSQL,
Java,
Architecture

Ed Anuff reviewed Cassandra's built-in secondary indexes, noting that they don't work well for high cardinality values, require at least one equality comparison and return unsorted results. Anuff presented patterns for alternative indexing including wide rows and tables that use Cassandra 0.8.1's new composite comparator operators to overcome these limitations.

MapR Releases Commercial Distributions based on Hadoop

Topics
Map-Reduce,
Big Data,
Announcements,
Architecture

MapR Technologies released a big data toolkit, based on Apache Hadoop with their own distributed storage alternative to HDFS. The software is commercial, with both a free edition, M3, as well as a paid edition, M5. M5 includes snapshots and mirroring for data, Job Tracker recovery, and commercial support. MapR's M5 edition will form the basis of EMC Greenplum's upcoming HD Enterprise Edition.

Yahoo Hadoop Spinout Hortonworks Announces Plans

Topics
Map-Reduce,
Open Source,
Big Data,
Announcements,
Apache,
Architecture

Yahoo spun-out its core Hadoop team, forming a new company Hortonworks. CEO Eric Baldeschwieler presented their vision of easing adoption of Hadoop and making core engineering improvements for availability, performance, and manageability. Hortonworks will sell support, training, and certification, primarily indirects through partners.

Hadoop Futures at Structure Big Data: DataStax Brisk, EMC, and MapR

Topics
NoSQL,
Architecture

DataStax described Brisk their new Hadoop distribution that stores data in Cassandra, EMC published an ad that promised big news about Hadoop and Greenplum on May 9th, and GigaOm claimed that MapR Technologies is building a proprietary version of Hadoop. DataStax told InfoQ there are production Cassandra clusters of 700 nodes, storing hundreds of terbaytes, and doing 200,000 writes per second.