BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Microsoft Releases Data Science Tools for Interactive Data Exploration and Modeling

Microsoft Releases Data Science Tools for Interactive Data Exploration and Modeling

This item in japanese

Bookmarks

Microsoft recently released two new data science tools for interactive data exploration: modeling and reporting. These data science utilities, called Interactive Data Exploration, Analysis and Reporting (IDEAR) and Automated Modeling and Reporting (AMAR) can be reused by data science teams for specific tasks in their projects.

Data science teams spend a significant amount of time writing code to answer data related questions like data schema, missing data elements, individual variable distribution & transformation, specific clustering patterns in the data, and the performance of Machine Learning (ML) models. These two tools can be used to automate these common tasks in the data science lifecycle. The goal is to ensure consistency and completeness of data science tasks across different projects in the organization.

Interactive Data Exploration:

IDEAR tool can be used to explore, visualize and analyze data and provide insights into the data. Based on Shiny library from R Studio, IDEAR includes data export and report generation features. Data export includes saving the associated R scripts generating the visualization to a R log file. The users can run the R log file to generate the data report automatically.

Other features of IDEAR include automatic variable type detection, variable ranking and target leaker identification, and visualizing high-dimensional data.

Automated Modeling and Reporting:

AMAR is used to train machine learning models with hyper-parameter sweeping, compare the accuracy of the models and assess the variable importance. We specify in a parameter input file, the ML models to run, which data to use for training and testing, the parameter ranges to sweep over, and the strategy for best parameter selection.

The model report generated by AMAR tool contains the model information, model evaluation & comparison and also the feature ranking.

IDEAR and AMAR tools run in CRAN-R and can be accessed from the GitHub website. This repository is part of the Team Data Science Process (TDSP) which was launched at Microsoft Machine Learning & Data Science Summit last month.

If you are interested in learning more about these data science tools, check out Microsoft Technet blog post or the Azure TDSP Utilities GitHub website.

 

Rate this Article

Adoption
Style

BT