Precog has recently announced a Big Data warehousing and analysis service which takes care of the data capture, storage, transformation, analysis and visualization process and the infrastructure on which it runs, but leaving open various access points throughout the service via RESTful APIs enabling developers and data scientists to control the entire process.
The release of Apache HBase on Amazon EMR both increases the reach of EMR by adding to it a significant new piece of technology and makes it easier to use HBase by automating many set up and maintenance activities.
After six years of gestation, Big data framework Apache Hadoop 1.0.0 was recently released. Core features in the release include Kerberos Authentication, support for Apache HBase and RESTful API to HDFS. InfoQ spoke with Arun Murthy, VP of Apache Hadoop, about the new release.
eBay presented a keynote at Hadoop World, describing the architecture of its completely rebuilt search engine, Cassini, slated to go live in 2012. It indexes all the content and user metadata to produce better rankings and refreshes indexes hourly. It is built using Hadoop for hourly index updates and HBase to provide random access to item information.
Apache has announced the release of Cassandra 1.0.0, the first major milestone of the distributed column-based data store coming with data compression and several performance improvements and optimizations.
Last week, Ed Anuff, founder of Usergrid, announced the first source code release available on GitHub. Usergrid is a comprehensive platform stack for mobile and rich client applications. It can be deployed as a highly scalable Cloud service, it is built in Java and runs on top of Cassandra.
Johnathan Ellis keynoted at Cassandra SF 2011. Ellis reviewed accomplishments including better support for multi-data center deployments, optimized read performance, included integrated caching and improved client APIs including a SQL-like language CQL. Looking forward, Ellis emphasized polish - efficient database repair, storage compression, optimized performance and an expanded CQL language.
Ed Anuff reviewed Cassandra's built-in secondary indexes, noting that they don't work well for high cardinality values, require at least one equality comparison and return unsorted results. Anuff presented patterns for alternative indexing including wide rows and tables that use Cassandra 0.8.1's new composite comparator operators to overcome these limitations.
Yahoo spun-out its core Hadoop team, forming a new company Hortonworks. CEO Eric Baldeschwieler presented their vision of easing adoption of Hadoop and making core engineering improvements for availability, performance, and manageability. Hortonworks will sell support, training, and certification, primarily indirects through partners.
JasperSoft announces reporting support for Hadoop and leading NoSQL databases.
The Hadoop Summit of 2010 included presentations from a number of large scale users of Hadoop and related technologies. Notably, Facebook presented a keynote and details information about their use of Hive for analytics. Mike Schroepfer, Facebook's VP of Engineering delivered a keynote describing the scale of their data processing with Hadoop.