InfoQ Homepage Data Analysis Content on InfoQ
-
How Statistical Forecasting Can Help You Trust Your Data and Drive Business Agility
Statistical forecasting is a highly effective way to improve delivery predictions and avoid some traditional estimation problems. In a case study from AgileByExample 2018, by Piotr Leszczynski, he says it can also help you understand and trust your data more, and drive improvements in business agility.
-
Investigating Near Misses to Prevent Disasters: QCon London Q&A
Investigating near misses by gathering data from the field and exploring anything that looks wrong or is a bit odd can help to prevent disasters, said Ed Holland, software development manager at Metaswitch Networks. At QCon London 2019 he gave a talk about avoiding being in the news by investigating near misses.
-
Making Machine Learning Adoptable for Clinicians
Dr. Alexander Scarlat explains the core tenants of machine learning in his 12-part series "Machine Learning Primer for Clinicians." Scarlat covers defining aspects of machine learning, followed by examples that communicate aspects of measuring the performance of machine learning models. The series uses animated charts in place of the math to help readers understand the machine learning concepts.
-
Amazon Offers Sustainability Datasets for Analysis
Amazon Web Services Open Data (AWSOD) and Amazon Sustainability (AS) are working together to make sustainability datasets available on the AWS Simple Storage Service (S3), and they are removing the undifferentiated heavy lifting by pre-processing the datasets for optimal retrieval. Sustainable datasets are commonly from satellites, geological studies, weather radars, agricultural studies, etc.
-
JetBrains Introduces Datalore 1.0, an Intelligent Web Application for Data Analysis
JetBrains recently introduced Datalore 1.0, an intelligent web application for data analysis and visualization in Python. Datalore 1.0 brings an improved smart code editor, user-controlled code execution, professional subscription, and more.
-
Dataiku's Latest Release Integrates Deep-Learning for Computer Vision
Collaborative data science platform Dataiku's latest release of its Data Science Studio includes pre-trained deep learning models for image processing. The DSS platform implements each step of a data-science project from data-sourcing and visualization to production deployment. Its machine-learning module supports standard libraries and it integrates with Hadoop and multiple Spark engines.
-
How 3rd Party Tools Nearly Killed Performance (and Culture) at Adidas
How the shoe and clothes giant manufacturer's IT tamed an out-of-control proliferation of third party tools in their global websites which was killing performance. Furthermore, this led to a blame culture setting in between business and IT. A new third party governance process focusing on performance data and user experience validation was key to stop the bleeding.
-
Mathieu Ripert on Instacart's Machine Learning Optimizations
Instacart is an online delivery service for groceries under one hour. Customers order the items on the website or using the mobile app, and a group of Instacart’s shoppers go to local stores, purchase the items and deliver them to the customer. InfoQ interviewed Mathieu Ripert, data scientist at Instacart, to find out how machine learning is leveraged to guarantee a better customer experience.
-
AFK-MC² Algorithm Speeds up k-Means Clustering Algorithm Seeding
“Fast and Probably Good Seedings for k-Means” by Olivier Bachem et al. was presented on 2016’s Neural Information Processing Systems (NIPS) conference and describes AFK-MC2, an alternative method to generate initial seedings for k-Means clustering algorithm that is several orders of magnitude faster than the state of art method k-Means++.
-
Precision Medicine Modeling Demonstration with Spark on EMR, ADAM, and the 1000 Genomes Project
AWS engineers Christopher Crosbie and Ujjwal Ratan detail using Spark on EMR for precision medicine data analysis on the ADAM platform with data from the 1000 genomes project.
-
Data Science in F# using FsLab: Interview with Tomas Petricek
FsLab, a collection of F# ooen source libraries for doing Data Science, was released earlier this year, InfoQ reached out with Tomas Petricek, creator of the project, to get more details.
-
Adatao Launches Full Stack Data Intelligence Platform
Adatao recently announced the general availability of its Data Intelligence platform. Its platform aims to make data analysis and predictive analytics available to everyone in large organizations. Adatao had secured an investment of $13 million last year from a group of investors including Bloomberg Beta, Lightspeed Venture Partners and Andreessen Horowitz.
-
Ayasdi Partners with Cloudera
Ayasdi announced last month a partnership with Cloudera, the biggest distributor of Apache Hadoop. The partnership will ensure the compatibility of their solution with Cloudera Enterprise 5, the latest version of Cloudera’s big data platform based on Apache Hadoop.
-
Forecasting at Twitter
Arun Kejariwal, from Twitter, talked at Velocity Conf London last month about forecasting algorithms used at Twitter to proactively predict system resource needs as well as business metrics such as number of users or tweets. Given the dynamic nature of their data stream, they found that a refined ARIMA model works well once data is cleansed, including removal of outliers.
-
Combining Data, Intuition and Fun in Lean Startup
The lean startup is a “scientific approach to creating and managing startups” as Eric Ries describes in the lean startup principles. It uses “hard things” like validated learning with experiments and data. But what the “soft things” like intuition, guts, feelings, passion, inspiration and fun, do they also matter when you are developing new products?