Data preparation is an important aspect of data processing and analytics use cases. Business analysts and data scientists spend about 80% of their time gathering and preparing the data rather than analyzing it or developing machine learning models. Kelly Stirman spoke last week at Enterprise Data World 2017 Conference about the data preparation best practices.
Causal Consistency models offer an alternative Eventual Consistency for distributed systems; both models should be weighed against your system's requirements and risk tolerance.
VMware releases SQLFire 1.0 a distributed SQL database geared towards high availability and horizontal scalability which offers table replication, table partitioning and parallel execution of queries.
JBoss Releases Hibernate 4.0 which comes with Multi-tenancy support, the introduction of a standard mechanism for writing Hibernate extensions, initial refactorings towards OSGI and several other cleanups.
Windows Azure team announced major updates including support for Node.js, better scalability for SQL Azure through Federation and higher individual DB Size limits (upto 150 GB), a limited preview for Hadoop and more.
The Hadoop Summit of 2010 included presentations from a number of large scale users of Hadoop and related technologies. Notably, Facebook presented a keynote and details information about their use of Hive for analytics. Mike Schroepfer, Facebook's VP of Engineering delivered a keynote describing the scale of their data processing with Hadoop.
In this databases roundup we take a look at DataFabric, FiveRun's recently open sourced data sharding plug-in for ActiveRecord. Also: a look at speeding up Postgres data access using the asynchronous client API and Ruby 1.9's Fibers.