Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.
2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.
The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.
EC2 users can now automate the deployment of Apache Mesos, an open-source tool to share cluster resources between multiple data processing frameworks, at scale through a new web service called Elastic Mesos provided by Big Data startup Mesosphere.
A new Apache incubator project, Tez, generalizes the MapReduce paradigm to execute a complex DAG (directed acyclic graph) of tasks.
Big Data analytics startup QuantCell Research has announced the release of the first public beta of what they are positioning as their "Big Data" spreadsheet.
ThoughtWorks's latest "Technology Radar" focuses on mobile, accessible analytics, simple architectures, reproducible environments, and data persistence done right.
Microsoft recently revealed new pricing structure for Windows Azure Storage along with several improvements.
LinkedIn engineering releases SenseiDB 1.0.0, a NoSQL database focused on high update rates and complex semi-structured search queries, already used in production by LinkedIn in its search related pages (e.g. People/Company search)
In his new article “MapReduce Patterns, Algorithms, and Use Cases”, Ilya Katsov gives a systematic view of the different MapReduce patterns, algorithms and techniques that can be found on the web or in scientific articles along with several practical use case studies.
After six years of gestation, Big data framework Apache Hadoop 1.0.0 was recently released. Core features in the release include Kerberos Authentication, support for Apache HBase and RESTful API to HDFS. InfoQ spoke with Arun Murthy, VP of Apache Hadoop, about the new release.
Corporations are increasingly using social media to learn more about what their customers are saying about their products. This presents unique challenges as unstructured content needs analytic techniques to interpret the sentiment embodied in the blog posts. InfoQ caught up with Subramanian Kartik to learn more about the blog sentiment analysis project his team worked on.
HPCC Systems, which is part of LexisNexis, is launching this week its Thor Data Refinery Cluster on the Amazon EC2. HPCC Systems is an enterprise-grade, open source Big Data analytics technology platform capable of ingesting vast amounts of data, transforming, linking and indexing that data, with parallel processing power spread across the nodes.
Recently Steve Jones, from Cap Gemini, questioned whether NoSQL/Big Data is the panacea that some vendors would have us believe. He suggests that in some cases in-memory RDBMS may well be the optimal solution and that approaches such as Map Reduce could be too difficult to understand for typical IT departments. He concludes with a suggestion some sometimes Big Data may be a Big Con.
Hortonworks, a company created in June 2011 by Yahoo! and Benchmark Capital, has announced the Technical Preview Program of Data Platform based on Hadoop. The company employs many of the core Hadoop contributors and intends to provide support and training.