BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Jupyter Notebooks: Interactive Visualization Approaches

Jupyter Notebooks: Interactive Visualization Approaches

Bookmarks
53:30

Summary

Chakri Cherukuri talks about how to understand and visualize machine learning models using interactive widgets. He introduces the widget libraries and walks through the code of a simple example to show how to assemble and link these widgets. He looks at models like regressions, clustering and finally a wizard for building and training deep learning models with diagnostic plots.

Bio

Chakri Cherukuri is a senior researcher in the Quantitative Financial Research group at Bloomberg. His research interests include quantitative portfolio management, algorithmic trading strategies and applied machine learning. Previously, he built analytical tools for the trading desks at Goldman Sachs and Lehman Brothers. He has extensive experience in numerical computing and software development.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Cherukuri: I am Chakri Cherukuri. I work in the Quantitative Research Group at Bloomberg in New York. I suck at making jokes. So our group does a lot of projects in quantitative finance, and we extensively use Jupyter Notebooks for all our research work. Before I start my talk, I just want a quick show of hands. How many of you guys use Jupyter Notebooks? Wow, that's quite a few of you. And how many of you have heard about ipywidgets or any other widget libraries in the Jupyter Notebook? Okay, that's fair enough, these libraries are not that popular.

In today's talk, we'll see how to build interactive visualizations in the Jupyter Notebook using these widget libraries. Since these libraries are not that popular, I'd like to take this opportunity to show the power of these libraries and highlight some of their best capabilities. We can build rich applications, dashboards, and tools using these widget libraries. One nice thing is, it's all pure Python. Even though there is a lot of JavaScript, it's totally hidden from the user. So all the interactivity comes from the JavaScript, but we totally hide that implementation and the only interface which the user sees and uses is the Python interface.

The outline of my talk today is as follows. I will first give a brief overview of interactive widgets and then I'll walk you through the code of a simple example of an interactive plot, so that you understand, at a high level, how to assemble and link these widgets. Then we'll go through some case studies of advanced applications. I'll start with the dashboard from server log files. I think it's a very relevant use case for all the software engineers. And then we'll look at an applied machine learning use case where we analyze the performance of a Twitter sentiment model. We'll see how interactive visualizations can help us better understand and interpret these models. And finally, we'll look at tools for doing the end-to-end pipeline of a deep learning model.

Interactive Widgets

Let's get started. Interactive widgets in Jupyter Notebook consist of two components. The first component is the Python interface. For example here, I'm creating an integer slider. You create a class called IntSlider, and it consists of a bunch of attributes by which we manipulate the widget. Then there is the visual representation, which is implemented in JavaScript, and that's what creates all the interactivity. But the visual presentation, the JavaScript code, is totally hidden from the user, so the user deals only with the Python interface. All of these attributes are called traits in Jupyter speak, and they implement the observable design pattern. So whenever you update these attributes, even such generated and messages are sent from the backend Python to the frontend JavaScript and vice versa from the frontend to the backend.

Let me show you the list of all these ipywidgets, they're all available on Read the Docs. Let me zoom in a bit. It's a very comprehensive list of all the standard UI controls. You have sliders and radio buttons, text boxes, and all the standard UI controls are supported in ipywidgets, and the code is all here with all the attributes. The second library we look at is called bqplot. Bqplot is on GitHub and it's a library which was built in Bloomberg by our group. Bqplot uses d3.js under the hood, so it provides a lot of interactive components. I want to show you the ipywidgets example, the integer slider I was talking to you earlier. Let me show you how it works.

I'm creating the integer slider here, and I'm passing a description. Now I've rendered the sliders, and you'll see this nice slider here. What happens when I move the slider- you can see that the value attribute of the object gets automatically updated. So that means a signal or a message is being sent from the front-end to the back-end. When I update the back-end by changing the value attribute of the slider, you see that the front-end automatically gets updated. This is what I mean by the bidirectional communication between the front-end and the back-end. All the interactivity here is happening in the JavaScript, and it's totally hidden from the user.

Now, let's look at a simple code example, where we tried to combine ipywidgets and bqplot to build an interactive plot. The example I'm going to show you here is the plot of a normal distribution. So normal distribution, as you all know, is the familiar bell curve. We want to see the impact of mu and sigma, the mean and standard deviation on the plot of the normal distribution. Let me walk you through the code.

I'm doing some imports here, and I'm going to use the pyplot API of bqplot. This is a nice place to get started on bqplot. Pyplot is very similar to the matplotlib's pyplot interface. I'm creating a vector X, and vector Y is the normal distribution function of X. I create a figure using plt.figure. Then I call plt.plot, which is the line plot, and I pass in the X and Y attributes. This function returns a line object. I store it in PDF line. I create two sliders to represent mu and sigma. I create a mu slider and sigma slider, and I use the HBox, which is a layout object in ipywidgets to horizontally stack these widgets. I have two sliders representing mu and sigma, and I'm going to do a final layout by stacking the figure and the slider layout to create a plot with two sliders. Obviously nothing happens when I move the slider, so let's go ahead and link the sliders to the plot. This is the crucial step.

For linking widgets, what we do is we create a callback. I am going to define a function called update distribution, where I read in the values of mu and sigma from the sliders. I recreate the normal distribution. What I'm doing here is directly assigning that value to the Y attribute of the line object. This is an important step. You directly update attributes in place and that's what triggers the redraw the plot, right? When I update the Y attribute which is implemented as a traitlet, it sends a message to the front-end to redraw the plot.

Similarly, I would like to update the title of the figure to reflect the new values of mu and sigma. Once I create this callback, I want to register the call back with the sliders. I do that using the observed method. What I'm doing here is whenever the value attribute of this widget, the slider changes, it changes when you move the slider. When you move the slider, this callback will get moved, right? So let's do that. Here we have a nice interactive plot. When I move the mu, you see the impact of the mu on the normal distribution. When I move the sigma slider, you see the impact of sigma, right? This is a simple example, but I want to highlight some of these aspects of creating callbacks and linking the widgets using the observed method. Once you understand these concepts, it's very easy to scale up to build rich interactive plots.

Before I start with my use cases, I would like to show you a few interactive aspects of bqplot. So the example I'm going to show you is called kernel regression. It's all implemented in numpy here. I'm not going to walk you through the code, but I just want to highlight some interactive aspects of this visualization. It's a nice segue from the previous example where we looked at the plot of the normal distribution. I'm going to reuse the plot here. So kernel regression is an extension of polynomial regression. In polynomial regression, we do ordinary least squares where we assign equal weights to all the samples. In kernel regression, the assigned weights which come from the weighting function, and Gaussian kernel is a very popular weighting function. Gaussian kernel is nothing but the normal distribution which we have seen in the previous example.

Let me give you some intuition on how this works. Let's say we have the test point right here for which I want to make the prediction. I'm going to give importance to all the points which are in the immediate neighborhood of this test point. As I go far away from the test point, I want the weights to decay exponentially. I create this kind of localized behavior where I give importance only to the points which are in the immediate neighborhood of the test point, unlike the polynomial regression where I give equal weight to all the samples.

This bandwidth controls the width of this Gaussian kernel, and this bandwidth is nothing but the standard deviation of the normal distribution plot which we have seen earlier. Here we have the nonlinear regression line, and we have a cloud of points here. There is also one more hyper-parameter called polynomial order where we can, for example, order 0 is flat, order 1 is linear, and order 2 is locally weighted quadratic fit etc.

Let's see the impact of this hyperparameters from the regression fit. When I reduce the bandwidth, you see that the normal distribution kind of strings. Now we are giving importance only to a very few points in the immediate neighborhood, and we're ignoring all the points which are far away. It's a very localized behavior, and that's what happens, you get a very wiggly regression curve and we're overfitting. Where I make this bandwidth very high, so in theory, when I make it infinity, this curve becomes uniform distribution. That means I'm assigning equal weights to all my samples, and you just recover the polynomial regression. So you see polynomial regression is a special case of kernel regression.

Scatterplot here in bqplot has some interactions possible. I enabled the click add and move interactions. I can click and move the points to see the impact of outliers on the regression fit. You see, it's totally overfitting. I can also click and add points to see the extrapolation effects. I can change the order of the polynomial, and you see it's totally flipping the curve, right? By increasing the bandwidth, we're kind of smoothing the function. And they can also display the confidence intervals. These are just the standard deviation bands. As you can see, this interactive visualization is very helpful in trying to understand these models which have a lot of hyperparameters. Also by using the interactive components of the scatter plot, I can move the points and I can click and add new points and see the impact of outliers and extrapolation and all those things.

The next example I have is a time-series plot. This is the time-series of the S&P 500 index. As you all know, this is the most popular equity index in the U.S. markets. When doing time-series analysis, ideally we'd like to select different time periods, either to perform some statistical analysis or to look for trends, right? A bqplot has a nice selector framework which provides interval selectors to select intervals of line plots. We have brush selectors and lasso selectors, and so on. Let me activate the interval selector here. As you can see, this interval selector responds to mouse moves. Then I move the mouse pointer up, I expand the interval, and when I move the mouse pointer down, I contract the interval. I can also select specific regions and click and freeze the interval. Here I selected a two-year period and I can look at different two-year slices here.

The statistical treatment I'm doing here is I'm drawing the trend line. You’ll notice that the color of the trend line changes depending on the sign. When the trend is negative, it is red. And when the trend is positive, it is green. Also the width of the line changes according to the strength of the trend indicator. So the stronger the trend, the thicker the line. I'm also showing the total return here over the period.

Let's now look at some interesting periods. This is the familiar 2008 financial crisis where you would have lost almost 50%. This is the dot com crash here, somewhere here. This is the bull period after the dot com crash. What's remarkable here is this rally over the last 10 years. If you had just invested in the index ETF and did nothing else, you would have almost tripled your money. But obviously I'm speaking with the hindsight bias. As you can see, these kind of interval selectors will let you perform very fast and statistical treatments by selecting different slices of time periods. I'm just scratching the surface of the interactive capabilities of bqplot here. We'll see some more examples in the case studies which I'm going to cover next.

Server Logs Dashboard

The first case study I have is building out dashboards from server logs. I think it's a very relevant use case for all the software engineers here. There is lot of information in server logs. We can get the timestamps of the request coming in, we can get the HTTP status codes, what kind of agents are making the request, and we can also get the URLs. And we can parse the URLs to obtain all the search parameters for the queries so we can get the product IDs, category IDs, among other things.

What kind of plots we can do in the dashboard. We can do the time-series plot of all the events, like the last example we have seen. We can use interval selector to look at different trends. Also, we can look for seasonality, what periods of the day or what times of the year we have lot of requests coming in. All these things can be done using time-series plots. You can also look for outliers in the time-series plots. And now we can also do aggregations of the events on a daily basis or on hourly basis. We can plot the events broken down by the search parameters, like products, categories, and status codes.

So let's look at an interactive dashboard which is built using bqplot. Obviously, I could not put any Bloomberg log files here, so I found some log files online, some realistic looking log files. They are not that big, but at least they look like log files. They have all the usual suspects, like timestamps and URLs and all. It's not a big log file, I just want to highlight the interactive aspects and then I'll discuss the kind of streaming analytics which we need to do for realistic log files.

The dashboard is built entirely in Python and I use pandas to load the log file in memory, and do some data managing here. So let me run all the cells. Yeah, so here we have a nice dashboard built entirely in bqplot. Let me go to the components. We have daily events, events which are aggregated on a daily basis. And then we have events aggregated on an hourly basis. Then we have events broken down by product, categories, and status codes. Now we can try to look for some patterns here, obviously this same sort of place. So I can click on this bar chart on this bar of the bar chart and just select the bar. We see here that data was available only after 6 p.m., so it may not be a data issue. This bar also looks out of place. And again, we see that the data was collected till 6 p.m.

It seems like it's just the way the log file was formulated. If you look at all the other days, it's pretty much uniformly distributed with requests coming around 1900 each day. Here in the hourly events plot, we can see some patterns, like from 5 to 9 a.m. You don't see much activity going on. Now let's try to filter on the status codes, and in bqplot you can select the slices of the pie, and I can update. You see that for two products, there were some errors. So we need to look into those products to fix the issues. There's a category called null which also resulted in failures. We can also search by products. I can select the whole products. You’ll notice that all the category events vanish, so implying these events might be destroyed. Similarly, when I select categories, you see that all the product events vanish.

Again, there are so many things you can do, and bqplot provides all these interactive capabilities. I can click on the bar and select that bar. I can click on the slices of the pie and do some filtering. You can also do time-series plots and use interval selectors. So there are lot of things you can do. Jupyter Notebook is a nice tool for software engineers where you can do big data analysis. And using these widget libraries, you can build like really nice dashboards.

Let me make a small comment on the scaling and performance. So obviously in real life, they're not going to have small log files. Jupyter Notebooks can be connected to streaming sources like Kafka. We can have hopes to directly do the data ingestion into the Jupyter Notebook. We can also use streaming data frames; there are some implementations called Dask and streamz, which provide streaming dataframes. Also, we can use pandas where you specify the chunk size, and you load the only data in chunks, as opposed to loading the whole log file in memory. So Jupyter Notebooks can definitely be used for doing this kind of analysis.

Twitter Sentiment Analysis

Let's move on to the machine learning use case now. It's called Twitter sentiment analysis. Let me give you some background on this problem. Extracting social sentiment from news stories and tweets is a very important problem in finance. And these news stories and tweets, these datasets are extremely unstructured. Also, they are highly time sensitive; we need to act upon immediately or almost in real time when you get a tweet or a news story.

We can use some machine learning models to train on these datasets and come up with a story level sentiment score. We can also create company level sentiment, for example like an aggregate all the tweets of Apple on a daily or hourly basis, and come up with the Apple sentiment score for the day. These sentiments scores are actually very important for investors. They can use it as a trading signal. For example, you can go long or buy the stocks which have positive sentiment or sell the stocks which have negative sentiment.

The crux of this sentiment scoring is this machine learning model where we try to classify the tweets based on the sentiment into three classes: negative, neutral, or positive. Right here, we have an example where this is a positive tweet because the stock is rated as a strong buy. So this results in a 3D classification problem. The inputs are the raw tweets and the output is the sentiment label, which can be negative, neutral, or positive. The methodology we used is we've given labeled training and test datasets. These tweets are manually tagged by human readers. So they go through each and every tweet and then they assign a label based on the sentiment. We need this label dataset for training the model.

We train the model on the training dataset. And we take the train model and make predictions on the test dataset. This test dataset was not used for training, so it's totally out of sample test. Since we already know the actual labels on the test dataset by comparing the predicted labels and the actual labels, we can analyze the performance of the model. The model I have used in this example is based on bag of words approach. What I did was I trained the logistic regression. And the logistic regression here is one versus rest. So what I mean by this is, since there are three labels, I trained three binary classifier for each label. So for example, the first model will classify the tweets as negative versus not negative. The second model will classify the tweets as positive, versus not positive, and so on.

So we have three models, three different models, and each model returns probabilities which are the confidences the model places in its predictions for each label. And finally, we output the label associated with the highest probability. Finally, we want to measure the performance of this model, so the first step is to look at misclassifications. We'll look at a nice matrix called Confusion Matrix. Also, we want to understand the model predicted probabilities. We will look at a nice presentation called triangle to visualize and understand model-predicted probabilities. Then we want to fix the data issues. Recall that these tweets are manually tagged by human readers. So there is a potential for making mistakes and we'll see how we can fix these data issues.

A quick primer on Confusion Matrix, Confusion matrix is a K x K matrix, where K is the number of labels or classes. Here we have three classes, so it's a 3 x 3 matrix. Cell ij of this matrix contains a number of samples whose actual label is I and the predicted label is J. So by definition, all the diagonal entries are correct predictions, because the predicted and the actual labels match. And the off-diagonal entries here are misclassifications, because the predicted and the actual labels do not match. We want to look at these off-diagonal entries in the Confusion Matrix.

Here, I'm going to talk about a model representation called Triangle. Remember that the model returns three probabilities with sum to 1. So the question is, how can we visualize these three numbers? The idea is to look at them as points inside an equilateral triangle. The three vertices here represent the three labels, and the location of the point and its proximity from the vertices give some indication about the predicted probabilities. For example, if you have a point right in the center, it's equidistant from all the three vertices. The model is assigning equal probabilities for all the labels, so we are not sure of the prediction. If you have a point very close to the positive vertex, for example, the model is assigning a very high positive probability because it's closest to the positive vertex.

Similarly, for all the positive predictions, they'll fall under this segment. Because the model makes a prediction as positive when the positive probability dominates the negative and neutral, right? So for any point in this segment, it's closest to the positive vertex. So that's some intuition behind this triangle. So I think with this background, let's look at dashboard. What I'm doing here is I'm loading a pre-trained model, I don't want to train the model here. I'm loading the test dataset here, which also has the test labels. What I do here is I'm packaging the whole dashboard into your class called PerformanceAnalysisDashboard. So that's one nice thing about this widget libraries. Using the primitive widgets, I can build compound widgets. I can package them in a class to build a compound widget. And the compound widget can again be used with other primitive widgets or compound widgets. We think of them as label blocks which can be stuck together.

Let me briefly show you the code for this dashboard. All you need to do is extend a class called box, which is the layout object in ipywidgets, and then you put together all your widgets and just call the super class constructor and pass in the children, all the widgets. And that's it, you will have a compound widget. Are you able to see the whole thing? Yes. Here we have a dashboard, and I think I explained all the components. Let's try to understand what's happening here. I click on this cell here, so here we have 103 tweets for which the predicted label is positive and the actual label is negative, right? These are obviously misclassifications. Here we have all the predictions in this segment, because these are positive predictions as I was mentioning before. And each point represents a tweet and the location of the point. For example, these points are very positive and these points are somewhere like midway.

Let me click on some tweets to understand what's happening. Group on results not looking good. So that's the tweet, and here we have the three predicted probabilities shown in the pie chart. This is a logistic regression, right? So it's a linear model, the model assigns weights to all the features. The features here are just the tokens of the tweet. Now by looking at this model, by the way, there are three models; this model is the one which classifies the tweets as positive versus not positive. By looking at these tokens and the model coefficients, we clearly see the way the model is saying this tweet is positive. It's looking at this bi-gram called look good, and it's giving importance to this token called good. It is penalizing not, but it's not enough. and it's saying that tweet is positive.

I can also look at the negative model by clicking on the slice of the pie here. If you look at this negative model, you see that it does give importance to the token not look, because the negative model wants to classify the tweets as negative versus not negative, right? It is giving importance to the token not look and penalizing good, but it's not enough to surpass the positive probability. So that's how you understand how the model is making its predictions by looking at these model coefficients.

Let me look at one more example here. Weight Watchers Sees Profit Slimmed; there is a nice pun here, Weight Watchers and slimmed. But slimmed unfortunately, is not a positive word, but it's a negative word. Let's see what the model is doing here. The model just sees the word profit and says it's positive. Look at the coefficient of slim here, it's 0. So what that means is the model might not have encountered this word slim at all in the training samples. There's no way for the model to know that slim is negative and assign it some negative weight. Even the negative model, for example, doesn't do anything to the word slim; the coefficient is 0. As you can see by looking at the model coefficients, we get a very nice intuition behind how the model is making its predictions.

Now let's focus our attention on the Confusion Matrix. We see that majority of the tweets are neutral, so all the misclassifications are in these four buckets here. Let me click on this bucket here where the model is saying the tweets are positive but all the tweets are labeled as neutral by the human readers. If you think about it from the perspective of human readers, they can easily distinguish between a positive tweet and a negative tweet. But then the distinction between what makes it tweet neutral versus positive, or neutral versus negative, it's not so easy. There is some subjectivity involved, and that's where humans can make mistakes.

Let's use this probabilities triangle to understand and find these data issues. If you look at this probabilities triangle, since all the tweets are predicted as positive, they fall in this bucket or this segment of the triangle and the tweets which are close to the positive vertex are interesting. Because the model is supremely confident that these tweets are positive, the model is assigning very high probabilities for these tweets. Let me use the lasso selector to select some of these tweets here. Let me read you through the tweets. “Microsoft sales beat street hopes, cloud profits up”. “All state profit beats estimates.” “Price target raised to 49.” “Upgraded by Zacks to buy.” All these tweets are actually positive, and the model is in fact right in making its predictions. So these are definitely data issues which need to be fixed.

The way we found these data issues was by zeroing in on these tweets, for which the model is assigning a very high positive probability. So understanding the model predicted probabilities is very important in making these kind of decisions. To summarize, these kind of interactive visualizations are very helpful in trying to understand and interpret these models by directly looking at misclassifications and then drilling down into individual tweets. From the tweets we're able to drill down into the individual tokens of the tweet. By looking at the model coefficients and the model predicted probabilities, you get a lot of intuition behind how the model is making its predictions.

I want to make a small comment on interpretable models, I think it's very important. Here we have used a simple logistic regression which is a very simple model, but it's easily interpretable. As we have seen, the model coefficients have a meaning and we can easily understand how the model is making these predictions. We could have easily improved the model by looking at some non-linear classifiers, like random forests or booster trees. We could have tried some deep learning techniques by using word embeddings. Or, we could further boost the performance by doing some stacking or creating an ensemble of deep learning models and shallow learning models. There is so much we can do to improve the performance. But then the model's interpretability is lost.

Especially, in situations where there is a requirement to explain the model to senior management or to customers, it's very important to go with simpler models because they are interpretable. Simple models are also robust to overfitting. Like with deep learning models, we can easily overfit when you increase the capacity of the model. I just want to leave it at that.

Tools for Deep Learning

Let's move on to our final example. So the final example is we're trying to build a neural network builder here. It's a tool for doing the end-to-end pipeline of a deep learning model, from selecting the network parameters to choosing the network architecture, and looking at the training and diagnostic plots. I built this internally in Python using ipywidgets and bqplot. And I built it using a plug and play architecture, where I provided base implementations for training and diagnostic plots, but they can easily be extended or customized by extending the base classes. For example, l provided implementation for regression diagnostic plots, and you could easily customize it or enhance it for classification diagnostic plots. There, we can provide interactive Confusion Matrix, for example, as we have seen in the previous use case. So it's very easy to extend these examples and extend this code.

Let me walk you through the code here. I'm using Keras as a deep learning library. The dataset I'm using here is derived from Black-Scholes formula. Black-Scholes formula is a smooth, nonlinear function of four inputs, and it returns a numerical output which is the price of an option. You use it heavily in finance. So this is a regression problem, because we are trying to predict the numerical output. I create some scaling. Again, I created a compound widget called NeuralNetworkBuilder by packaging all the primitive widgets and creating a compound widget, and then passing in the training and test datasets.

Let me make it full screen. So the wizard is right here, and the first tab is the Network Parameters. I can choose the number of epochs, I can choose the batch size for each epoch, and I can choose from a bunch of loss functions. I'm directly introspecting on the Keras' library to create these UI components. We can also look at different optimizers, in fact, each optimizer we can choose these hyperparameters for that specific optimizer. So let me go with Adam, which is a nice default optimizer. I'm giving some learning rate decay. The second tab I can choose the Network Architecture.

For now, I'm just providing support for fully connected layers. But it's very easy to extend the tool to add support for LSTM cells or accomodational layers. Notice that the tool is automatically infilling the number of inputs and outputs by looking at the training dataset. So let me add some hidden layers here. Secondly, add with 16 neurons. I can also choose from different activation functions. I'm doing tonnage here. Tonnage is a smooth function which goes from -1 to 1. So it's flat at -1 and goes to +1 and becomes flat.

I can also provide support for batch normalization for each layer. I can also do drop out and assigning the drop out probabilities. I'm going to do this here. I go to the Training tab, and now I train the model. We see that the loss and accuracy curves are getting updated in real time; they're all implemented in bqplot. Wow. You see that training was initially doing nothing and suddenly after 25 epochs, it starts learning and the accuracy starts going up. You also have a progress bar which shows you the progress.

Let's look at the distributions to understand what's happening. Here we have the distributions of weights, biases, and activations for different layers, and for differently epochs. For example, we see that weights are initially uniform, and as we go into the training process, the distribution kind of narrows, right, it becomes kind of like a uniform distribution which is good. And we can also look at the distributions of activations. Since tonnage goes from -1 to 1, you see that when all the values are either -1 or 1 with some values in between.

If you look at the last layer, this is where we see a problem. It's called saturation of activations. What's happening here is the activations are either -1 or 1. What happens when activations are -1 or 1? In tonnage, they are in the flat regions of the tonnage. 1 is flat and -1 is flat. And, therefore, the gradients which is the slope, they're all zeros. In the back propagation, no gradients are flowing back. Therefore, the weights are not getting updated in the gradient descent update.

You see and that explains this behavior here, the training was stalling because of the saturation of activations. If you go past 25 epochs, you see that the activations are no longer saturated, because there are values in between for which the gradient is non-zero, and it's flowing back in the backdrop and now the training is happening. So you can clearly understand and explain why the loss curve is looking like this. By looking at these distributions, we can see the activations distributions and understand if there is any saturation going on, instead of blindly not training the models and stalking layers and looking at the results. By kind of peeking into the black box and understanding what's happening under the hood, will really help you and give you some intuition behind how the training process is going on.

If you look at the diagnostic plots here, I provided implementation for just a simple residuals versus predicted values, it looks horrible. So let's go and try some different activation. Relu is a very popular choice which has a much faster convergence than tonnage. Let me choose relu here. Actually, let me just run it for 20 epochs. I keep the same architecture and just change the activation to relu. And now I redo the training process. You see that accuracy shoots up immediately to 100%, so it's much better than tonnage. You can also see that there is no saturation going on here for the last layer or for the first layers. The residuals versus predicted values definitely look much better than before, but obviously there is so much we can do. First, we need to scale the dataset and then try it for a lot of epochs.

But as you can see, this tool is a really nice tool for quickly changing different settings, trying out different activations and changing the optimizers and quickly training the models and looking at the plots in real time. So that's all I have for today. I hope you found these examples instructive. I hope you gained some background and knowledge of these interactive widget libraries. I want to thank you all for attending my talk. Thanks.

Questions & Answers

Participant 1: You only had one joke. That was a joke - that's not actually my question. My actual question is I think that sentiment analysis is one of the most difficult things to do. Because number one, people who are looking at the words, if they're labelers and stuff, they need to be familiar with that language. They need to be familiar with slang, like for example if I say "Oh, that's so sick." When you hear me say it, because of my tone and stuff like that, you know that like I had like a positive sentiment about something, versus, “That's so sick.” You know what I mean? The tone really like changes stuff, and also if for example a machine learning thing you saw that, it will be like "Oh, the word sick is there, it must be negative." So how is something like that accounted for? Thank you, that's all.

Cherukuri: Yes. Actually let me do one more joke since I'm under pressure here. Yes, so this is a realistic use case at Bloomberg. What we did was we outsourced to a company called CrowdFlower who manually tagged all these tweets. Remember, these are financial use cases, these are all financial tweets. Let me give you an example. Most of these people who tagged these tweets are in terms of people, have no clue about finance, so there was this tweet which contained a word called overweight. In financial jargon, overweight means an analyst is rating stock as a strong buy. But the interns thought that overweight is someone who is fat. They made the tweet as negative. You see they are totally domain specific, and it's very, very tough, it's a challenging job to actually tag and assign correct labels. And that's why tools like this are very important to find data issues.

As you said, there are so many ways. For example, we saw the word called slimmed, which is a totally polarity inverting word. But it was in the context of Weight Watchers and Slimmed, this subtle polarity inverting words which are very tough for a program to understand. Think about it. These algorithms don't know anything about English, they're just looking at words and ...

Participant 2: It's like profit slim?

Cherukuri: Yes.

Participant 2: Versus losses slim.

Cherukuri: Yes, exactly. And the word slimmed had a negative connotation, which the model could not capture. All it's doing is it's looking for such words in the training samples. And the more words it finds, the more negative or the more coefficients or the higher the coefficient is, it doesn't know anything about English.

Participant 3: So the RNN model would be better, basically?

Cherukuri: No. Actually, in the sentiment kind of problems, deep learning models are generally not so good, it's what I heard.

Participant 4: Even for sarcasm?

Cherukuri: Yes, absolutely.

Participant 5: So is this bqplot is open-source? So anybody can use without any...

Cherukuri: Yes. It's open-source on GitHub. Sorry, the link was broken, I had no internet connection. But it's open-source on GitHub, yes.

Participant 5: And the second question is, does this tool have the capability of storing the results? When you play with the different type of parameters, you get the different results. So does it store the results as well, or it's just more like a visualization tool?

Cherukuri: No. The nice thing about this tool is you can provide support for anything. For example, I can have a Save Model button. When you click on that Save Model button, it will just save a snapshot of all the parameters model. I mean, the sky's the limit in what you can do, because it's all Python, right? I can have any UI components there, and you click on them, and it does whatever you want, so yes, totally.

Participant 5: Does it have the community to learn if we have to play with this tool, and learn more about what else I can do with this?

Cherukuri: Yes. One thing I forgot to mention was I did a similar talk at JupyterCon, where I put all my code in sample notebooks, whatever I showed in the talk on GitHub. I to do similar things, because I think it makes sense to actually play with the code, as opposed to just seeing it in the talk. I'll see if I put it on my GitHub.

Participant 6: I was actually going to ask if there was a GitHub with this stuff in there?

Cherukuri: Yes. I'm going to put it on GitHub.

Participant 6: And these visualizations are really sick.

Cherukuri: Oh yes. But I just want to make a small comment. These are built using its nice black theme. Because Bloomberg terminal has a black background, we did a lot of work to make it work really well on the black background. And if we tried out on the classic white background, they're not going to look so good. But yes, we did a lot of work to make them work on a black background.

Participant 6: I guess as open-source you can always fix it, right?

Cherukuri: Yes. So in Jupyterlab, I'm trying to push. Because we have one of the people on the steering committee working at Bloomberg, I'm trying to push him, because I get these questions all the time. How come the visualizations look so good in the talk, but when they do it on the white notebook, they don't look so good? In Jupyter lab, we're definitely trying to have a black-themed notebook. Not only that, but the black theme bqplot and ipywidgets and everything. So many things need to be black-themed to make it work.

Participant 6: This is somewhat adjacent to what you were showing, but at Bloomberg, what's your solution for when you want to share this with somebody and still have it be interactive? So without it being somebody who's technical and knows how to install your dependencies and things like that.

Cherukuri: Yes. If you notice there is this thing called BQuant here, so we are trying to create a product. I don't want to talk about Bloomberg stuff here, but these notebooks can easily be shared. That's the beauty of notebooks, right? You can easily share the notebook, and we have some abilities here. If you look at these buttons, I can hide the code. Let me just quickly show you this. I can hide all the code. So the user of this notebook and say once I save this notebook and give it to a user, is not going to see a single line of code.

Participant 7: We have no audience if you install Keras ...

Cherukuri: No, no.

Participant 7: You can't expose them, right?

Cherukuri: Yes. All I'm saying is all I need is to just provide the URL of my notebook. This guy, that's it. Once I shared this with him, because it's all running on the notebook server, all the software and packages are in ...

Participant 7: Microsoft.

Cherukuri: Yes, of course. Everything is on the notebook server, and all they need to do is hit this URL and you have lot of things here. I can hide all the code, I can make a white background, change the backgrounds. For example, I can just take the visualization and make it full screen. So we have all these extra options which make the interactivity and all these things better.

Participant 8: So you almost answered half of my question, but I'm actually looking. We work a lot with R. And R has something like R Shiny, where you can just get notebook and turn into the app. Are you aware of any project in Jupyter Space? It's almost like what you have right now, but do you give your user's URL?

Cherukuri: Yes. As far as Jupyter Notebooks are concerned, I think we are pushing the envelope on this whole app building in the notebook. I don't think anyone is doing it as much as Bloomberg, because we have a couple of people from the Jupyter steering committee working with us. And we actually want to build some products using this Jupyter Notebooks. But, yes, I agree with you. It's not as sophisticated as R Shiny, but we're getting there. Hopefully some of that will be open-source. I don't know how much of it will be open-source, but yes.

 

See more presentations with transcripts

 

Recorded at:

Mar 31, 2019

BT