Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Introduction to Graph Visualization with Alexander Smirnov

Introduction to Graph Visualization with Alexander Smirnov

The old saying, “A picture is worth a thousand words”, is even more true when working with complex business data. To help the user understand what they are seeing, developers often turn to bar and pie charts. But that only works for discrete data; when at the links between data other tools come into play. To dive deeper into the topic we asked Alexander Smirnov, creator of GraphX, to explain what graph visualization is and how it can be used.

Alexander Smirnov: First of all let’s define what a graph is. A graph is a representation of a set of objects (nodes) where some pairs of objects are connected by links. So the main task of graph visualization is to display such data in a user-friendly and understandable manner. For example, if you have a tree-like mathematical graph you will want it to be displayed in a tree nodes layout. Or if you have the task of displaying massive amount of unstructured data with many links (e.g. Twitter or Facebook user connections) you may want to use some special layout, anything that will help you to represent such data with maximum readability.

(Click on the image to enlarge it)

So at this point in order to perfectly visualize any graph we have to solve three problems: create nodes layout, eliminate node overlaps and provide efficient edge routing algorithm.

First of all, we need to create nodes layout for graph. This layout defines the main pattern and the logic by which all the nodes of the graph will be displayed. GraphX provides many predefined layout algorithms that can be used out-of-the-box, for example, Tree or Circular algorithms will display nodes in a tree or circle manner.

(Click on the image to enlarge it)

Second task is to eliminate node overlaps. In this task we calculate more precise positions for every node in the layout making sure that none of them overlaps any other and all of the nodes are clearly visible.

The third and the last task is to display links between nodes in our layout. The simplest way to achieve it is to connect all linked nodes with the straight lines but this might not be an option for many complex layouts. So we must undertake the task of so called “edge routing” by calculating optimal paths for every link. GraphX provides some predefined edge routing algorithms that can bypass nodes on the field or perform an edge bundling technique.

(Click on the image to enlarge it)

InfoQ: In computer science we tend to name and extensively study well known sorting algorithms. Are there also well-known graph layout and edge routing algorithms?

Alexander Smirnov: I think the most known graph layout algorithm is Fruchterman-Reingold (FR) force-directed (or energy-based) layout algorithm. The main purpose of force-directed algorithms is to position graph nodes in two- or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of nodes, based on their relative positions. Usually, huge force-directed graph layouts look like a splash of paint :)

In FR algorithm, the sum of the force vectors determines which direction a node should move. The step width, which is a constant determines how far a node moves in a single step. When the energy of the system is minimized, the nodes stop moving and the system reaches its equilibrium state. The drawback of this is that if we define a constant step width, there is no guarantee that the system will reach equilibrium at all. T.M.J. Fruchterman and E.M. Reingold introduced a "global temperature" that controls the step width of node movements and the algorithm's termination. The step width is proportional to the temperature, so if the temperature is hot, the nodes move faster (i.e. a larger distance in each single step). This temperature is the same for all nodes, and cools down at each iteration. Once the nodes stop moving, the system terminates.

InfoQ: Can you explain what edge bundling is?

Alexander: Edge bundling is an edge routing improving technique that bundles the edges that are going in the same direction. It helps to improve graph visualization by reducing visual clutter, inevitable in large graphs, allowing to find high-level edge patterns and get overall graph overview by just looking at it.

Edge bundling algorithm implements force-directed technique to attract intermediate edge route point to each other. This results in so called edge pipelines that can be easily noticed by looking at the graph. GraphX also implements edge curving technique that provides better edge looking by smoothing the edges connections between intermediate edge routing points.

Without Edge Bundling

(Click on the image to enlarge it)

With Edge Bundling

(Click on the image to enlarge it)

InfoQ: GraphX is based on an older library called Graph#. Were you involved in that project as well?

Alexander: No, I wasn’t. But in the process of my own work I became familiar with Graph# code and now I do what I can to support users answering their questions in Graph# Discussions section on CodePlex.

InfoQ: What problems did you see in Graph# that prompted you to rewrite it as GraphX?

Alexander: First of all, it was outdated. And in as often as not you know what it means: no product support, no bug fixes, no improvements, etc. But the main problem that I came across when I was trying to tackle a similar task was the issue with the performance and library extensibility.

At that time I had the task to visualize about 2000 nodes with many links and original Graph# took too much time to calculate and display the data. So I rewrote the code to make it faster, for example moved edge endpoints calculation from the template calls into the class code and simplified the highlighting logic excluding too much event calls and interactions.

Regarding extensibility issues... May be this is my personal opinion but Graph# looked limited. There were so many questions in the Discussion threads on CodePlex which seemed obvious but in practice required Graph# modifications and code knowledge. In the process of creating GraphX I was looking at such questions and I built flexible architecture where most questions of such kind can be solved out-of-the-box with no need to dig into the library internal code.

No matter what, Graph# is an outstanding piece of work, especially in graph algorithms implementation.

InfoQ: Wow, 2000 nodes seems like quite a lot. On the upper end, how many nodes do you think can be rendered before the screen becomes too cluttered for the user to comprehend?

Alexander: The answer on that question is highly depends on the problem that end-user want to be solved using graphs. For example, when you want to display overall connection scheme between Twitter users you will generally get very huge amount of nodes and connections to display. Combining the LinLog layout algorithm and edge bundling technique you can pretty much avoid screen cluttering and get the graph overview where different user groups can be visually separated from the rest of the connections. On the other hand, if you display tree-like layout of the same Twitter connections you will get very huge structured graph with an unreadable overview look because graph nodes will be very small.

You must decide what do you want to achieve and what layout or edge routing algorithm will be better used for your particular task. GraphX provides many predefined algorithms for the end-user along with the providing ability to easily implement custom algorithms and use them within GraphX engine.

So as you can see GraphX have some tools to help deal with screen cluttering and I can say that precise nodes amount that can be rendered depends on the selected approach for solving the task.

InfoQ: Are you reusing any of the Graph# code in GraphX, or just the concepts?

Alexander: Well I think I can answer yes and yes. I took most of the algorithms code form Graph#, left them untouched and just modified some external interfaces. I decided not to waste time on reinventing the wheel and I concentrated my efforts on the concepts of new code architecture. That is where I borrowed some concepts from Graph# which seemed reasonable to me. I tried to separate the processes of calculation and rendering as much as I could to make it easy to improve each piece of logic separately, but some parts of the logic and visualization still remained very close to each other.

InfoQ: What’s your opinion about reusing code from other projects verses referencing the other project as a compiled library? To you see any particular for choosing one or the other?

Alexander: There is no correct answer to this question. That depends on what you need from the library. In GraphX I used some external libraries in both of the ways. For example, magnificent YAXLib library provided all the serialization features I needed and it didn’t contain any excess components so I just included it into my project as it was. On the other hand I decided to include ZoomControl and Zoombox code only into my project to be able to alter their behavior in different situations, for example different zoom modes or mouse actions.

So on the whole I suggest you to reference libraries when:

  • They completely satisfy you in what concerns features and performance
  • They have no junk or excess code/components inside
  • You’re not too much bothered by library updates
  • You’re not bothered by another DLL file in your app folder

Otherwise integrate their code into your project and customize it the way you want. Performance usually is not a problem in both cases, except that many references increase app startup time and take additional memory.

InfoQ: Have you run into any situations where you wanted to incorporate code from another project but were blocked because that code used a different license?

Alexander: Thankfully, I haven’t run into such situations. I believe that if you share the code of your project you should grant all the rights to use or modify it freely. In exchange those who use it should include its license notes and copyright inside their project.

InfoQ: Do you have any plans to port GraphX to Windows 8 Store or Windows Phone 8?

Alexander: That wasn’t on my To-do list but it is most probable. I’m looking for the possibilities of expanding GraphX to different platforms however this is the question of time and knowledge. For example, Silverlight port investigations led me to the conclusion that it will require too much time and efforts to do this because I need to rework almost every part of rendering and templates code. Despite this circumstances, the work on Silverlight port and GraphX v2 is on the way, slowly but true.

Regarding Windows 8 and WP8 I can say that I’m interested in mastering these platforms as I have not worked with them yet. They are innovative and use technologies similar to WPF, so it must be interesting and rather simple compared to other available possibilities.

InfoQ: If you wanted similar visualizations in a web site, how would you go about doing it?

Alexander: Well it depends on what should I take as the base. If I take existing GraphX code then it is reasonable to make visualization using Silverlight as it uses simplified WPF technology for rendering. Most of the logic will be left intact but I will have to make new controls from scratch and be aware of async processes logic in the whole project architecture.

Honestly I’m not familiar with any other technologies that can be used to implement graph visualization for web. I assume that Flash or any similar technology can be used for that task but I doubt that GraphX code can be used in that case so this is a long way to go.

InfoQ: Are you currently looking for volunteers to work on GraphX?

Alexander: Sure, GraphX is opened for collaboration. Especially, I’m interested in collaboration by porting GraphX into different platforms and implementation of new graph layout and edge routing algorithms. If you have the skills and willing to participate just let me know to where GraphX is moving soon.

InfoQ: Is this just a new homepage for GraphX or are you going to stop using CodePlex entirely?

Alexander: This is a new web site that will host GraphX homepage along with my other projects under the “Panthernet Dev Works” group logo. Codeplex will still be used as the main code repository storage and release notifications site, but Documentation, Downloads and Discussions sections will be moved to the new site and forums.

For now I have some plans to develop my own group that will be involved in development of open-source and commercial projects. There will be more information on the forums, so anyone who is interested may apply.

InfoQ: So how many different graph layout and edge routing algorithms do you currently have?

Alexander: Currently GraphX provides access to 11 layout, 2 overlap removal and 3 edge routing predefined algorithms. And as I’ve mentioned earlier you can create and use your own algorithms within GraphX using provided interfaces.

Algorithms take the most important and complex part of the library. User contributions in this area are always welcomed.

About the Interviewee

Alexander Smirnov, .NET Lead Developer, is currently working in "I.T. co.", Russia, in the areas of data visualization and text analysis. He is also involved with several projects related to mobile platform apps development using C#.

Rate this Article