Ron Bodkin
Ron founded Think Big Analytics to help customers leverage new data processing technologies like Hadoop, NoSQL databases and R for statistical analysis. Ron works with customers to develop solutions that leverage unstructured data and new techniques. Previously Ron was the VP of Engineering for Quantcast. Each day Quantcast uses map-reduce to load 10 billion events and produce more than a petabyte of data for production reporting, ad hoc analysis, data mining and machine learning. Prior to that Ron was a founder of enterprise consulting companies C-bridge and New Aspects.
All of Ron Bodkin's Content on InfoQ
Latest featured content by Ron Bodkin

- Topics
- NoSQL,
- Data Access,
- Design Pattern,
- Agile,
- Big Data,
- Database Design,
- Performance & Scalability,
- Data Warehousing
Ron Bodkin of Big Data Analytics discusses early adoption of Hadoop, NoSQL and big data technologies. He discusses common patterns and explains how developers can write low-level primitives to optimize MapReduce function. Other topics include Hive, Pig, multi tenancy, and security.

- Topics
- Map-Reduce,
- Machine Learning,
- Operations,
- Big Data,
- Architecture
Ron Bodkin discusses big data architecture, real-time analytics, batch processing, map-reduce, and data science.

- Topics
- Architecture,
- Big Data
Ron Bodkin presents the architecture used by Quantcast to process 100s of TB of data daily using Hadoop on dedicated systems, the applications, the type of data processed, and the infrastructure used.
News by Ron Bodkin
- Topics
- Search,
- NoSQL,
- Big Data
eBay presented a keynote at Hadoop World, describing the architecture of its completely rebuilt search engine, Cassini, slated to go live in 2012. It indexes all the content and user metadata to produce better rankings and refreshes indexes hourly. It is built using Hadoop for hourly index updates and HBase to provide random access to item information.
- Topics
- NoSQL,
- Java,
- Big Data,
- Apache,
- Architecture
Johnathan Ellis keynoted at Cassandra SF 2011. Ellis reviewed accomplishments including better support for multi-data center deployments, optimized read performance, included integrated caching and improved client APIs including a SQL-like language CQL. Looking forward, Ellis emphasized polish - efficient database repair, storage compression, optimized performance and an expanded CQL language.
- Topics
- NoSQL,
- Java,
- Architecture
Ed Anuff reviewed Cassandra's built-in secondary indexes, noting that they don't work well for high cardinality values, require at least one equality comparison and return unsorted results. Anuff presented patterns for alternative indexing including wide rows and tables that use Cassandra 0.8.1's new composite comparator operators to overcome these limitations.
- Topics
- Map-Reduce,
- Big Data,
- Announcements,
- Architecture
MapR Technologies released a big data toolkit, based on Apache Hadoop with their own distributed storage alternative to HDFS. The software is commercial, with both a free edition, M3, as well as a paid edition, M5. M5 includes snapshots and mirroring for data, Job Tracker recovery, and commercial support. MapR's M5 edition will form the basis of EMC Greenplum's upcoming HD Enterprise Edition.
- Topics
- Map-Reduce,
- Open Source,
- Big Data,
- Announcements,
- Apache,
- Architecture
Yahoo spun-out its core Hadoop team, forming a new company Hortonworks. CEO Eric Baldeschwieler presented their vision of easing adoption of Hadoop and making core engineering improvements for availability, performance, and manageability. Hortonworks will sell support, training, and certification, primarily indirects through partners.
- Topics
- NoSQL,
- Architecture
DataStax described Brisk their new Hadoop distribution that stores data in Cassandra, EMC published an ad that promised big news about Hadoop and Greenplum on May 9th, and GigaOm claimed that MapR Technologies is building a proprietary version of Hadoop. DataStax told InfoQ there are production Cassandra clusters of 700 nodes, storing hundreds of terbaytes, and doing 200,000 writes per second.