BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles How to Build Interactive Data Visualizations for Python with Bokeh

How to Build Interactive Data Visualizations for Python with Bokeh

Bookmarks

Key Takeaways

  • Bokeh is a powerful tool for exploring and understanding your data or creating beautiful custom charts for a project or report.
  • Bokeh provides a Python API to create visual data applications in D3.js, without necessarily writing any JavaScript code.
  • It allows the use of standard Pandas and NumPy objects for plotting, including NumPy arrays, plain lists and Pandas series.
  • In the Python visualization space, Bokeh is the most ideal candidate for building interactive and dynamic visualizations across different mediums.

Data understanding is a crucial data analysis stage according to the CRISP-DM standard (Cross-industry standard process for data mining), and data visualisation is the most useful approach here. Bokeh library is designed for both interactivity and novel graphics, with or without a dedicated server or reliance on Javascript. This article will show how Bokeh is a powerful tool for exploring and understanding your data or creating beautiful custom charts for a project or report.

The article will take you through;

  • Using Bokeh to transform your data into visualizations
  • Customizing your visualizations using Bokeh
  • Adding interactivity to your visualizations

There is very detailed documentation at docs.bokeh.org, among other advantages. Quickstart user guide is definitely a must-try, for instance. In his project, Visualizing Anomalies in the Dataset, David Miller, a U.S.-based Python engineer at Education Ecosystem, notes that “Data visualization is key to understanding the information contained in the data. Interactive data visualizations provide valuable means for exploring data. Bokeh provides a Python API to create visual data applications in D3.js, without necessarily writing any JavaScript code.” 
 

Installation Bokeh for Python environment requires the following commands:

conda install bokeh

or

pip install bokeh

There is a bokeh.sampledata module with prepared .csv and .db files with widely used datasets, for instance, Apple NASDAQ index, Airline on-time data for all flights departing etc. 

In a nutshell, we will go through the process of Bokeh application creation that is a recipe for generating Bokeh documents. Typically, this is Python code run by a Bokeh server when new sessions are created.

What are the steps involved in building a visualization using Bokeh?

Preparing the data

How do you prepare data using libraries such as Numpy and Pandas to transform it into a form that is best suited for your intended visualization? 

Bokeh allows the use of standard Pandas and NumPy objects for plotting. There are several Python data structures that could be used for further Bokeh visualization:

  • NumPy arrays
  • plain lists
  • Pandas series 

Let us consider Bitcoin historical data as an example of time series data preparation for visualization (Fig. 1). This dataset contains CSV files for select bitcoin exchanges for the time period of Jan 2012 to December 2020, with minute to minute updates of OHLC (Open, High, Low, Close)

Fig. 1. Bitcoin history DataFrame

NumPy arrays are used as data storage (Fig. 2)

Fig. 2. Bitcoin history to NumPy array

The resulting Bokeh plot is as follows

Fig. 3. Interactive Bokeh plot

The full code for Bitcoin data visualization in Jupyter notebook is provided (bokeh-bitcoin-data.ipynb) with resulting BTC.html (BTC.html).

When data is passed like this, Bokeh works behind the scenes to make a ColumnDataSource for further plotting.

At the most basic level, a ColumnDataSource is simply a mapping between column names and lists of data. The ColumnDataSource takes a data parameter which is a dict, Pandas DataFrame. If one positional argument is passed to the ColumnDataSource initializer, it will be taken as data. Once the ColumnDataSource has been created, it can be passed into the source parameter of plotting methods which allows you to pass a column’s name as a stand-in for the data values (Fig. 4).

Fig. 4. Using of the ColumnDataSource

Data preparing stage described in details in official documentation  (Providing Data — Bokeh 2.2.3 Documentation).

Determining where the visualization will be rendered

At this step, you’ll determine how you want to generate and ultimately view your visualization. The plot is the key concept in Bokeh library.  Plots are containers for glyphs, guides, annotations, and other tools. There are two approaches to generate and save plots: simple .html files, local or remote server application. 

Application server is the most versatile and convenient way to distribute an application. In this case, various widgets could be used for input values changing. There are Callback methods that allow for updating the data for the plot on the server. These changes are automatically synced back to the browser, and the plot updates. The interactive application allows users to manipulate data and to obtain actual plots (Fig. 4), for instance Bokeh Crossfilter Example application that illustrates autompg dataset.

Fig. 4. Bokeh server application

So, Jupyter notebook is one way to create visualizations through exploratory data analysis. Alternative approach is to develop a small app that could be run locally, or that could be sent to colleagues to run locally. The Bokeh server is very useful and easy to use in this scenario.

Bokeh command bokeh serve --show myapp.py will cause a browser to open up a new tab automatically to the address of the running application. More details on server creation can be found in the official documentation Running a Bokeh Server — Bokeh 2.2.3 Documentation

Setting up the figure(s)

At this step, you’ll specify data visualization filters and plot tools: pan/drag, click/tap, scroll/pinch.

There are various interactive tools for changing plot parameters such as zoom level, range extents etc. These tools could be grouped into four categories:

  • Gestures (Pan/Drag Tools, Click/Tap Tools, Scroll/Pinch Tools)
  • Actions (Reset Tool)
  • Inspectors (HoverTool, CrosshairTool.)
  • Edit Tools

All these tools combine to the toolbar that also has parameters like toolbar_location at the figure() function.

The code of Tap Tools using is shown in Fig. 5. The Callback method returns coordinates of the point was tapped. 

Fig. 5. The code of Tap Tool using

The results of Tap Tools implementations are shown in Fig. 6. The coordinates are displayed in the browser console, which can be launched with the F12 key.

Fig. 6. Tap Tools usage 

More details on plot tools can be found in the official documentation Configuring Plot Tools — Bokeh 2.2.3 Documentation

Connecting to and drawing your data

Explain how to use Bokeh’s multitude of renderers to give shape to your data. We shall explore visual properties: lines, fill, text, glyphs. Bokeh provides a wide range of renderers such as circle(),  square(), triangle(), asterisk(), line(), vbar() etc. as basic visual building blocks or glyphs. Example of usage of these graphic primitives is shown in Fig. 7. 

Examples of scatter plot with circles for the Iris dataset and related code snippet are shown in the Bokeh gallery (iris.py — Bokeh 2.2.3 Documentation).

There are many styling visual attributes such as line, fill, text properties and so on. The list of properties is given by link Styling Visual Attributes — Bokeh 2.2.3 Documentation.

Organizing the layout

Show how to easily organize your visualizations into a tabbed layout in just a few lines of code. There are various layout options for organizing plots and widgets. Layouts allow you to manage multiple components to create interactive dashboards or data applications. 

The grid of plots and widgets could be built with layout functions, for instance column(), row(), gridplot() etc. There are different sizing modes, e.g. ‘stretch_width’, ‘stretch_heigh’, ‘stretch_both’, ‘scale_width’ etc. These modes allow plots and widgets to resize based on the browser window.

For instance,  layered plots for Bitcoin history dataset are shown in the Fig. 7

Fig. 7. Coding of different layer organisation

The results for grid plotting are shown in Fig. 8. 

Fig. 8. Grid layer of the plots

Previewing and saving your beautiful data creation

Finally, explore your visualization, examine your customizations, and play with any interactions that you added.

There are two methods for output generating:

  • output_file() for saving plots in the outer .html file
  • output_notebook() for rendering directly in Jupyter notebooks

There are some methods for image exporting in addition e.g. export_svgs(), export_png(). The examples of Jupyter notebooks and .html usage are given in the previous sections. So the library provides tools for saving graphs in an external file and embedding in interactive notebooks.

Summary

Bokeh, unlike some of its common counterparts in the Python data visualization space, e.g. Matplotlib and Seaborn, renders its graphics using JavaScript and HTML, to make it the ideal candidate for building interactive and dynamic visualizations across different mediums.

In this article, we have examined the data preparation stage. Bokeh library can work with standard Python objects such as flat list, dictionary, NumPy array, Pandas DataFrame, and Series. This makes it very easy to prepare data for visualization.

We reviewed the basic visualization Bokeh methods and gave an example of code for Bitcoin history dataset. An interactive visualization tool was built as a result. It can be used as a Jupyter notebook for rapid prototyping. Saving the results to an external .html file will allow you to embed it in web applications. The Bokeh server and client applications set the library apart from standard Python rendering tools such as Matplotlib or Seaborn.

Hence, Bokeh is a good tool for interactive data visualization. It contains more tools than other libraries but simpler than frameworks such as Dash.

About the Author

Michael Garbade is CEO & Founder of Education Ecosystem. Forward-thinking, global, serial entrepreneur with expertise in software development, backend architecture, data science, artificial intelligence, fintech, blockchain, and venture capital. Garbade combines experience with tech, data, finance and business development with an impressive educational background and a talent for identifying new business models. As co-founder and CEO of Education Ecosystem, Garbade's mission is to build the world’s largest decentralized learning ecosystem for professional developers and college students.You can read his blog here.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT