You are now in FULL VIEW

Data Cleansing and Understanding Best Practices
Recorded at:

| by Casey Stella Follow 0 Followers on Mar 23, 2017 |

Casey Stella talks about discovering missing values, values with skewed distributions and discovering likely errors within data, as well as a novel approach to finding data interconnectedness based on usage using unsupervised learning. He describes the impact of these lessons on team construction and how to avoid some of the most painful lessons.

Sponsored Content


Casey Stella is a committer and PMC member on the Apache Metron project in the engineering team at Hortonworks. Previously, he worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a Research Geophysicist in the Oil & Gas industry.

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Login to InfoQ to interact with what matters most to you.

Recover your password...


Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.


More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.


Stay up-to-date

Set up your notifications and don't miss out on content that matters to you