Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.
Hadoop is definitely the platform of choice for Big Data analysis and computation. While data Volume, Variety and Velocity increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics. Spark, Storm and the Lambda Architecture can help bridge the gap between batch and event based processing.
Arun Kejariwal, from Twitter, talked at Velocity Conf London last month about forecasting algorithms used at Twitter to proactively predict system resource needs as well as business metrics such as number of users or tweets. Given the dynamic nature of their data stream, they found that a refined ARIMA model works well once data is cleansed, including removal of outliers.
Jeff Magnusson from Netflix team gave a presentation at QCon SF 2013 Conference about their Data Platform as a Service. Following up to this presentation, we will look at the technology stack and how it helps Netflix to tackle important business decisions.
The lean startup is a “scientific approach to creating and managing startups” as Eric Ries describes in the lean startup principles. It uses “hard things” like validated learning with experiments and data. But what the “soft things” like intuition, guts, feelings, passion, inspiration and fun, do they also matter when you are developing new products?
Thoughtworks recently released a new installment of their technology radar highlighting techniques enabling infrastructure as code, perimeterless enterprises, applying proven practices to areas without, and lightweight analytics.
DNN Social enables customers to interact with the site interface via blogs, discussion forums, FAQ's and includes features such as gamification, analytics, ideation and activity stream, which enables site administrators to gauge the effectiveness of interaction.
ThoughtWorks's latest "Technology Radar" focuses on mobile, accessible analytics, simple architectures, reproducible environments, and data persistence done right.
Precog has recently announced a Big Data warehousing and analysis service which takes care of the data capture, storage, transformation, analysis and visualization process and the infrastructure on which it runs, but leaving open various access points throughout the service via RESTful APIs enabling developers and data scientists to control the entire process.
Oracle Big Data Appliance and Big Data Connectors support integration with Hadoop, Cloudera Manager and Oracle NoSQL Database. Oracle announced last month the availability of Big Data Appliance and Connectors as well as partnership with Cloudera. They also recently announced the Advanced Analytics for Big Data by integrating R statistical programming language into Oracle Database 11g.
In a recent press news from 13th December, SAP announced at the SAP Influencer Summit in Boston that “leading software vendors are adopting the open SAP HANA platform for their existing products and building completely new applications.” Among them are companies such as T-Mobile and TIBCO.
Microsoft is making available a cloud service called Social Analytics for users interested in analyzing Twitter, Facebook, Blogger, YouTube, etc. in order to get insight on the trends on the social web.
In a recent news article the Massachusetts Institute of Technology has introduced a technology for automatically remembering connections between objects. The provided system determines how objects in a large software project interact, so it can inform latecomers which objects they will need to design certain types of functions.
Imagine ad hock data mining queries against a single table with 1 TB of data and 1.44 billion rows coming back in roughly a second. This is the scenario Microsoft intends to support using 32-core machines and their new column-based storage engine.
InfoQ interviewed David Smith, VP of Community for Revolution Analytics at the Strata big data conference. Revolution provides commercial extensions for the open source R statistics package and announced the R Enterprise v4.2 Suite along with offering tools to help SAS users to migrate to R.