InfoQ Homepage Presentations Data Preparation for Data Science: A Field Guide
Data Preparation for Data Science: A Field Guide
Summary
Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.
Bio
Casey Stella is a committer and PMC member on the Apache Metron project in the engineering team at Hortonworks. In the past, he has worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a research geophysicist in the oil & gas industry.
About the conference
CRUNCH is a use case heavy conference for people interested in building the finest data driven businesses. No matter the size of your venture or your job description you will find exactly what you need on the two-track CRUNCH conference. A data engineering and a data analytics track will serve diverse business needs and levels of expertise.