Earlier last month in Las Vegas, at IBM Insight 2015, IBM announced a major commitment to the Apache Spark project. Referring to it as “potentially the most significant open source project of the next decade” tells a lot about how important IBM believes Apache Spark is. With IDC reporting that 80% of cloud applications in the future will be data intensive, Apache Spark can unlock previously...
IBM has announced a new web portal called developerWorks Open, bringing together various projects they are open sourcing. The projects cover many domains including Analytics, Cloud, IoT, Mobile, Security, Social, Watson and others. So far, IBM has open sourced about 30 projects, and they plan to increase the number up to 50 by the end of the year, and others may come in the future.
New Relic has released a set of new features to its Software Analytics Platform. Service Maps is a real time visual map focused on services. Together with a tool for Docker monitoring, a database dashboard for NoSQL databases and an unified alerts platform, the company wants to reduce complexity in modern software architecture.
NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.
Adatao recently announced the general availability of its Data Intelligence platform. Its platform aims to make data analysis and predictive analytics available to everyone in large organizations. Adatao had secured an investment of $13 million last year from a group of investors including Bloomberg Beta, Lightspeed Venture Partners and Andreessen Horowitz.
Pinterest, the company behind the visual bookmarking tool that helps you discover and save creative ideas, is using real-time data analytics for data-driven decision making purposes. It’s experimenting with MemSQL and Spark technologies for real-time user engagement across the globe.
The latest version of big data analytics tools Splunk Enterprise and Hunk support instant pivot, enhanced event pattern detection, and prebuilt dashboard panels. Splunk Inc., provider of the software platform for operational intelligence, recently announced the general availability (GA) of version 6.2 of Splunk Enterprise and Hunk: Splunk Analytics for Hadoop and NoSQL Data Stores.
LinkedIn recently open sourced Cubert, its High Performance Computation Engine for Complex Big Data Analytics. Cubert is a framework written for analysts and data scientists in mind.Developed completely in Java and expressed as a scripting language, Cubert is designed for complex joins and aggregations that frequently arise in the reporting world.
Microsoft recently announced new machine learning capabilities for Microsoft Azure platform. Developers can also create their own web services and publish them to Azure Marketplace. Microsoft also announced availability of Apache Storm for Azure. Azure Stream Analytics, Data Factory and Event Hubs for Azure were all announced in the past few weeks by Microsoft. In this article we explore moreabout
Usage of data generated by drones is going to add a new horizon in data storage and processing. Kirk Borne, professor at George Mason University recently talked about the challenge of processing, storing and transferring this data.
With Raising the game - The IBM Business Tech Trends Study (PDF) IBM has evaluated the current adoption landscape of 4 key technologies in the enterprise: Big Data & Analytics, Cloud, Mobile and Social, comparing today’s adoption with 2012’s and Pacesetters against Dabblers.
Ayasdi announced last month a partnership with Cloudera, the biggest distributor of Apache Hadoop. The partnership will ensure the compatibility of their solution with Cloudera Enterprise 5, the latest version of Cloudera’s big data platform based on Apache Hadoop.
Recently, Spark graduated from the Apache incubator. Spark claims up to 100x speed improvements over Apache Hadoop over in-memory datasets and gracefully falling back to 10x speed improvement for on-disk performance. Based on Scala, it can run SQL queries and be used directly in R. It provides Machine Learning, Graph database capabilities and other further discussed in the article.
Hadoop is definitely the platform of choice for Big Data analysis and computation. While data Volume, Variety and Velocity increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics. Spark, Storm and the Lambda Architecture can help bridge the gap between batch and event based processing.
Arun Kejariwal, from Twitter, talked at Velocity Conf London last month about forecasting algorithms used at Twitter to proactively predict system resource needs as well as business metrics such as number of users or tweets. Given the dynamic nature of their data stream, they found that a refined ARIMA model works well once data is cleansed, including removal of outliers.