BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

You are now in FULL VIEW
CLOSE FULL VIEW

Data Preparation for Data Science: A Field Guide
Recorded at:

| by Casey Stella on Apr 23, 2017 |
45:00

Summary
Casey Stella presents a utility written with Apache Spark to automate data preparation, discovering missing values, values with skewed distributions and discovering likely errors within data.

Sponsored Content

Bio

Casey Stella is a committer and PMC member on the Apache Metron project in the engineering team at Hortonworks. In the past, he has worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a research geophysicist in the oil & gas industry.

CRUNCH is a use case heavy conference for people interested in building the finest data driven businesses. No matter the size of your venture or your job description you will find exactly what you need on the two-track CRUNCH conference. A data engineering and a data analytics track will serve diverse business needs and levels of expertise.

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and dont miss out on content that matters to you

BT