Data Science in F# using FsLab: Interview with Tomas Petricek
FsLab, a collection of F# ooen source libraries for doing Data Science, was released earlier this year, InfoQ reached out with Tomas Petricek, creator of the project, to get more details.
InfoQ: What is FsLab? What plusvalue does it bring, compared to using the librairies included in FsLab separately?
Tomas Petricek: FsLab is a package of open-source cross-platform .NET libraries for doing data science. The idea is that we want to make it really easy for people to get started. You can either get FsLab from NuGet using the Paket or NuGet package manager or you can just download one of our templates and start extracting interesting facts from raw data!
The main thing that FsLab brings is that it comes with high-quality libraries for all parts of the data analytics process, so you do not have to worry about finding decent CSV parser or type provider and good cross-platform charting library that would work on all of Mac, Windows and Linux. They are all included in the package. FsLab also adds some nice integration between them, so you can pass a data-frame or a time-series to R type provider to calculate some statistics and it will just work.
For those using FsLab from F#, the one great feature is that it provides a load script that you can use to reference all the libraries when working in F# Interactive. Rather than adding more than 10 separate references, you can write just: #load "packages/FsLab/FsLab.fsx"
InfoQ: Where did the idea for FsLab come from?
TP: It must have been when I was doing the Understanding the World with F# talk at Channel 9. The talk was about doing data science using the libraries now included in FsLab. First, I realized that we need a single link that we could give to people who want to learn more and it turned out that the www.fslab.org domain was still available! Second, I realized that if I want to have all my demos live-coded, I need a quick way to reference all the libraries that I want to use. Call it Talk Driven Development but if I cannot setup everything during a live coded demo, it is just too complicated!
InfoQ: Can you describe how does MBrace integrates with FsLab?
TP: The two already work really well together and we even did recently a mini-conference at the Microsoft campus featuring both MBrace and FsLab.
MBrace lets people take their local small-data scripts and easily scale them across multiple machines in the cloud. You can already reference all the FsLab libraries from your MBrace scripts. So, if you’re parsing data with type providers or doing time-series calculations using FsLab, you can just take your existing code and run it in the cloud using MBrace.
There are a couple of things we want to do in the future. First, you can expect more of the data analysis libraries to come with a cloud-based version out-of-the-box. One prototype project that we’re working on is “BigDeedle”, which lets you work with large time series data – both locally (over a small slice) and in the cloud (over the whole data set). For more information, check out our first demo! Another really interesting project that we’d like to add to FsLab is a deep learning library called Hype.
InfoQ: Some libraries inside FsLab depend heavily on F# type providers. What do they bring to the project? What can be done with type providers that couldn’t be done otherwise?
TP: Type providers integrate external resources into the programming language. What does this mean?
With type providers, you get the best of both worlds. The JSON type provider infers the structure of the response and it automatically exposes it as an F# type that the compiler and editor both understand. So, you just point the JSON type provider at the service and then you can use “.” to explore the data it returns.
A nice example showing this is the blog post analyzing James Bond movies from Evelina Gabasova. Thanks to type providers, 125 lines of R code become just 45 lines of F#!
InfoQ: Is it possible to embed directly the charts generated with FsLab in an application? For example, would it be possible to use FsLab as a backend to generate charts dynamically and send them as http content to a user?
TP: Definitely! If you are using XPlot, then you can produce nice HTML5 charts using either Plot.ly or using the Google Charts libraries. The XPlot charting library makes it quite easy to grab the HTML for the chart and embed it in a web application. In fact, that’s exactly what I did when visualizing interesting facts about the world using XPlot!
If you are building desktop applications, you can use F# Charting, which is based on Windows Forms and can be easily embedded in Windows applications, but we’ve been doing more work on XPlot, which is cross-platform and produces really nice HTML5 charts.
InfoQ: You are also the author of Analyzing and Visualizing Data with F#, a report recently published and available freely online. How does it relate to FsLab?
TP: There are quite a few resources out there on F#, machine learning and data science and all of them use one or more of the FsLab libraries. For example, F# Data type providers appear pretty much in every new F# book. What we wanted to show in the Analyzing and Visualizing Data with F# report is how they all fit together in the context of data science. So you'll see how to start by accessing data with type providers, interactively explore it and build a neat visualisation - and it all fits in a brief report that's just 50 pages long!
FsLab is available in multiple flavors. Templates are provided to be up and running immediately, while it can also be referenced using Packet or Nuget. FsLab is also an open source project and can be found on GitHub.