Lambda architecture has been a popular solution that combines batch and stream processing. Kartik Paramasivam at LinkedIn wrote about how his team addressed stream processing and Lambda architecture challenges using Apache Samza for data processing. The challenges described are the late arrival of events and the processing of duplicated messages.
Reactive microservices, data center scale operating system (DCOS), and staging reactive data pipelines were the highlighted topics at Reactive Summit 2016 Conference held this week. InfoQ team attended the conference and this post is a summary of the first day's events at the conference.
Jamie Grier recently spoke at OSCON 2016 Conference about data streaming architecture using Apache Flink. He talked about the building blocks of data streaming applications and stateful stream processing with code examples of Flink applications and monitoring.
Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities.
Late last month Google released an alpha version of their TensorFlow (TF) integrated cloud machine learning service as a response to a growing need to make their Tensor Flow library to run at scale on the Google Cloud Platform (GCP). Google describes several new feature sets around making TF usage scale by integrating several pieces of the GCP like Dataproc, a managed Hadoop and Spark service.
Recently at the 2016 Build Event in San Francisco, Microsoft announced a change to their Power BI offering. The update comes in the form of giving customers and ISVs with the ability to embed Power BI reports within their own applications. Microsoft is calling this service Power BI Embedded and it is currently in preview.
Funnel analysis is used to analyze a sequence of events to help with user engagement on a website or a mobile application. Data Science team at Twitter uses this concept to learn how users interact with user interfaces during sign up or tweeting for improving user engagement with Twitter.
IBM has announced four new data services: Analytics Exchange, Compose Enterprise, Graph, and Predictive Analytics. IBM’s new data services are meant to enable users to analyze their own data or get access to datasets provided by IBM. While some of the services run on Bluemix, for others the data can be deployed on other clouds, including private ones.
Net Promoter Score (NPS) is a customer loyalty metric used to determine the likelihood that a customer will return to a company's website or use their service again. Airbnb uses NPS extensively in measuring the customer loyalty, as a more effective measurement to determine the likelihood that a customer will return to book again or recommend the company to their friends.
Yahoo! has benchmarked three of the main stream processing frameworks: Apache Flink, Spark and Storm.
IBM has inaugurated the IoT Global Headquarters and will use the Watson technology to analyze and interpret IoT data.
Earlier last month in Las Vegas, at IBM Insight 2015, IBM announced a major commitment to the Apache Spark project. Referring to it as “potentially the most significant open source project of the next decade” tells a lot about how important IBM believes Apache Spark is. With IDC reporting that 80% of cloud applications in the future will be data intensive, Apache Spark can unlock previously...
IBM has announced a new web portal called developerWorks Open, bringing together various projects they are open sourcing. The projects cover many domains including Analytics, Cloud, IoT, Mobile, Security, Social, Watson and others. So far, IBM has open sourced about 30 projects, and they plan to increase the number up to 50 by the end of the year, and others may come in the future.
New Relic has released a set of new features to its Software Analytics Platform. Service Maps is a real time visual map focused on services. Together with a tool for Docker monitoring, a database dashboard for NoSQL databases and an unified alerts platform, the company wants to reduce complexity in modern software architecture.
NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.