Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.
2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.
Neural networks have long been an interesting field of research for exploring concepts in machine learning (otherwise known as artificial intelligence). Dr James McCaffrey of Microsoft Research recently gave an introduction to neural networks for those looking to learn more about them in an engaging talk that includes working demo code.
Concurrent, Inc., the enterprise Big Data application platform company, today announced Pattern, a machine learning based on an industry standard called PMML which allows analytics frameworks such as SAS, R, Microstrategy, Oracle, etc., to export predictive models and run them on Hadoop clusters
ThoughtWorks's latest "Technology Radar" focuses on mobile, accessible analytics, simple architectures, reproducible environments, and data persistence done right.
Corporations are increasingly using social media to learn more about what their customers are saying about their products. This presents unique challenges as unstructured content needs analytic techniques to interpret the sentiment embodied in the blog posts. InfoQ caught up with Subramanian Kartik to learn more about the blog sentiment analysis project his team worked on.
In a recent news article the Massachusetts Institute of Technology has introduced a technology for automatically remembering connections between objects. The provided system determines how objects in a large software project interact, so it can inform latecomers which objects they will need to design certain types of functions.
Ravi Kannan from Microsoft Research has been appointed winner of the ACM SIGACT's (Special Interest Group on Algorithms and Computation Theory) Knuth Price 2011. According to the press announcement he receives the price for his work on influential algorithic techniques aimed at solving long-standing computational problems.
InfoQ interviewed David Smith, VP of Community for Revolution Analytics at the Strata big data conference. Revolution provides commercial extensions for the open source R statistics package and announced the R Enterprise v4.2 Suite along with offering tools to help SAS users to migrate to R.
The need for machine-learning techniques like clustering, collaborative filtering, and categorization has steadily increased the last decade along with the number of solutions needing quick and efficient algorithms to transform vast amounts of raw data into relevant information. Apache Mount 0.3 has been announced on March, adding more functionality, stability and performance.