Hadoop 2 is now Generally Available, with YARN bringing ability to build data-processing applications that work natively in Hadoop. We spoke to Rohit Bakhshi, product manager at Hortonworks, about YARN and what it means for Hadoop users.
Big Data analytics startup QuantCell Research has announced the release of the first public beta of what they are positioning as their "Big Data" spreadsheet.
In his new whitepaper, Best Practices for Amazon EMR, Parviz Deyhim outlines the best practices in using AWS EMR including moving data to AWS, strategies for collecting, compressing, aggregating the data, and common architectural patterns for setting up and configuring Amazon EMR clusters for processing.
Concurrent, Inc., the enterprise Big Data application platform company, today announced Pattern, a machine learning based on an industry standard called PMML which allows analytics frameworks such as SAS, R, Microstrategy, Oracle, etc., to export predictive models and run them on Hadoop clusters
A new open-source contribution from Microsoft uses the Windows Azure Service Bus to provide scale out support for real-time Node.js applications. This module, called socket.io-servicebus, connects multiple servers running the popular Socket.IO module. This contribution is yet another example of Microsoft embracing Node.js and integrating it with Microsoft products and services.
The recently released Windows Azure updates include support for Hadoop service, HTML5/JS, CORS, PhoneGap including Mercurial, Dropbox, CodePlex and Bitbucket deployment integration.
Datastax Enterprise 3.0 was announced last month with several Enterprise security features for a cluster using Cassandra, Hadoop and Solr. InfoQ caught up with Robin Schumacher, VP of Products at DataStax to learn more.
Concurrent, Inc., the enterprise Big Data application platform company, today announced Lingual, an open source project enabling fast and simple Big Data application development on Apache Hadoop using SQL.
EMC Greenplum has announced Pivotal HD, a new Hadoop distribution including a fully compliant SQL MPP database running on HDFS and being “hundreds of times faster than Hive”.
Hortonworks’ new Stinger initiative joins Apache Drill and Cloudera Impala in competition for the best real-time Hadoop implementation.
Oracle’s key-value database, known simply as “Oracle NoSQL Database” has hit version 2.0. Oracle NoSQL Database is essentially a distributed frontend for Berkeley DB, but it offers much more than that. Support for SQL queries, both absolute and eventual consistency, and the option to reduce storage space using Avro schemas sets it apart.
At its heart, SQL is a domain specific language designed to allow non-professional programmers to query databases and write ad hock reports. When a company moves from a relational database to a NoSQL offering the need for ad hock reporting doesn’t go away, it just becomes harder. Simba’s ODBC drivers shift the power back into the hands of the users.
Few months back, Microsoft announced HDInsight, Microsoft’s Hadoop distribution for managing, analysing and making sense out of large volumes of data. InfoQ connected with Val Fontama, Senior Product Marketing Manager for SQL Server, to know more about how the Enterprise Big Data @ Microsoft story is panning out.
Netflix has released Hystrix, a library designed to control points of access to remote systems, services and 3rd party libraries, providing greater tolerance of latency and failure. Hystrix features thread and semaphore isolation with fallbacks and circuit breakers, request caching and request collapsing, and monitoring and configuration.
In his new blog post Hortonworks Vice President of Corporate Strategy Shaun Connolly discusses the importance of Apache Ambari incubation project and the main milestones achieved by the project in 2012: simplified cluster provisioning, pre-configured key operational metrics, job execution visualization, a RESTful API and an intuitive UI.