BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Martin Hadley on R and the Modern R Ecosystem

| Podcast with Martin Hadley Follow 0 Followers by Werner Schuster Follow 3 Followers on Jul 21, 2017 |

In this podcast Werner Schuster talks to Martin Hadley, data scientist at University of Oxford. They discuss the state of the R language, the rich R ecosystem that covers development (RStudio), notebooks for publication (R Notebooks, RPubs), writing web apps (Shiny), and the pros/cons of the different data frames implementations.

Key Takeaways

  • R is the tool for working with rectangular data
  • Modern data frame implementations are Tibble and data.table (for large amounts of data)
  • RMarkdown and R Notebooks allow to explore data and then publish it the results and (interactive) visualization
  • Use Shinyapps to publish server side R applications
  • Tidyverse is the place to look for modern R packages

Show Notes

R - the language

  • 1:14 - Many new users perceive R not as a language but as a piece of software.
  • 1:44 - R is great for rectangular data (from DBs or spreadsheets or flat files), less so for numerical data. Data frames are the technology for rectangular data, similar to data frames from Panda.

Data frames in R

  • 2:44 - Different implementations of data frames. Base R data frames are included with R, but are a bit dated. Tibble comes from the Tidyverse, new in 2015. For huge amounts of data use the data.table package.
  • 3:59 - Interfaces for Base R data frames and Tibble are interchangeable; converting them to data.table data frames takes some work.

Which R implementation to choose

  • 4:39 - Popular R implementations:  CRAN by the R Foundation, the package manager for R. It’s not that efficient, no MKL support for matrix operations.
    Microsoft’s R implementation (used to be Revolution Analytics).
    ValidR by Mango, uses MKL libraries etc, went through popular libraries from CRAN and validated they do what they promise to do.
  • 7:29 - CRAN is a repository for R packages, users can submit their packages to CRAN.
    TidyVerse by RStudio is a more tightly controlled repository focussed on newer technologies.
  • 9:49 - Typical R users are mostly data scientists and academic researchers.

RStudio

  • 11:11 - RStudio is the standard programming interface, provides notebooks with R Markdown which can export to formats like  HTML, PDF, and MS Word documents. Notebooks contain text, code, and output.  Can include HTML Widgets.
    R Notebooks with R Markdown can be published for free to RPubs.com.
    Published notebooks contain the data/code and visualizations, interactive visualizations are shipped with the data to allow interaction.

Deploying R code

  • 16:27 - Different deployment options for R code. The server version of RStudio allows to run R code on the server.
    Use Shiny for interactive elements in R, RStudio has a hosted solution shinyapps.io, free and paid (support) versions

What not to do with R

  • 17:53 - R is not too well suited for numerical simulation, better to go with Python or others. R is GPL as are the packages on CRAN.
  • 19:20 - R can be tricky to learn at first for people who are programmers. R has some libraries that can provide functional or other concepts.

R language development and community

  • 21:41 -  R language is quite stable over time.
    “R for Data Science” by Hadley Wickham, Garrett Grolemund‎.
    Upcoming changes are new ways for non-standard evaluation in Tidy Eval, gives lazy evaluation and other features.
  • 23:4523:45 - Resources for R: Twitter for keeping up with news on the #rstats and #tidyverse hashtag. Following @hadleywickham for new developments. RStudio blog for the IDE and Tidyverse. Martin’s courses on Lynda

Resources

Books Mentioned

More about our podcasts

You can keep up-to-date with the podcasts via our RSS feed, and they are available via SoundCloud and iTunes.  From this page you also have access to our recorded show notes.  They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Sponsored Content

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT