High Performance Math with ILNumerics for .NET
A common belief among programmers is that .NET is slow. Not slow like an interpreted language, but still slow compared to languages like C when it comes to complex mathematical operations. And generally speaking that is true because of factors such as .NET’s predisposition towards heap allocating all but the simplest values.
But it doesn’t have to be that way. Frameworks such as ILNumerics use advanced techniques such as custom memory managers and automatic parallelization to close the gap without losing the advantages of high level languages such as Visual Basic and C#. We spoke with Jonas Nordholt about what makes ILNumerics special.
InfoQ: Why ILNumerics? Why not just use straight .NET code?
The main reason is: performance.
We at ILNumerics think that .NET offers the best tools for creating enterprise applications nowadays. However, when it comes to numerical algorithms and large data need to be processed (like in finance, statistics, engineering), the price for the convenience of .NET becomes relevant: the automatic memory management increasingly gets in the way.
That's why C/C++ is still used for those applications – with all well-known disadvantages of native code regarding development and maintenance. Thus, whenever numerical methods come into play, development periods lengthen and expert level programmers are needed.
We have created ILNumerics to close this technological gap. ILNumerics provides its own memory management, optimized for numerical algorithms, and an intuitive syntax, similar to popular mathematical languages. It automatically parallelizes large parts of your algorithm on multicore systems. In addition to that, ILNumerics uses the extended optimization features of .NET (pointer arithmetic, cache optimization etc.).
In this way, ILNumerics allows enterprise developers to achieve high execution speeds for numerical algorithms without any technological gap – directly within .NET.
InfoQ: Is ILNumerics written in C# or a native language?
All of ILNumerics is written in C#: The array classes and all elementary functions, sorting and the visualizations are purely managed. However, for state of the art Linear Algebra and FFT we use Intel’s MKL, which relies on netlib lapack at the end: mature FORTRAN codes, optimized for x86 processors. There are other packages in ILNumerics that utilize native libs (HDF5 to name one). But as we are providing convenient interfaces, our users don’t have to deal with them.
InfoQ: How do you mitigate the marshalling costs for invoking native code?
A single call to some native lib usually takes less than 20 processor instructions to be generated. Since we use only data types that exist on both sides (managed and native), there is no overhead for marshalling (“blittable types”).
From the user's perspective: The managed C# syntax brings a very high convenience for everyday use. In case the need for interfacing native libs arises, the user can do so – easily and with high speed.
InfoQ: You say that you have your own memory manager. Why did you build it and how does it differ from what available out of the box?
The .NET CLR garbage collector does a great job for common small / mid-sized business data objects. However, for big / huge data stored in contiguous arrays, all existing GC techniques bring many disadvantages. Using such large objects frequently can cause a huge overhead for the GC (among other drawbacks like heap fragmentation and an increased number of cache misses). That's why such memory blocks should get reused by using memory pools.
We didn't want our users to have to deal with the issue. So we invented a new approach. The ILNumerics memory management consequently re-uses memory blocks for all mathematical objects. Based on the expressive ILNumerics syntax, these objects contain information about their lifetime. That allows the memory management to collect used memory in a pool and to avoid unnecessary reallocations
As a side effect, this opens the way for further optimizations regarding automatic parallelization – something we are working on for a future release.
InfoQ: What kinds of objects are put into the pools? Do developers need to be aware of these pools like they need to be aware of database connection pools? Or is all of the book keeping handled for them?
The whole memory management in ILNumerics is transparent to the user. That is something we are very proud of: We could preserve the "managed" convenience and – at the same time – got rid of all the disadvantages you usually experience when relying on the GC for such large objects.
At the end, the memory pool does only store plain .NET arrays. They serve as the underlying storage for our sophisticated multidimensional ILArray<T> objects and are heavily reused – automatically.
InfoQ: Do you have any automatic parallelization in place now?
Yes. Currently all functions provided by ILNumerics utilize multiple threads for execution on multicore machines. Right now, we are working on a new approach for the automatic utilization of even more parallel resources: GPUs, Vector Extensions and ManyCores.
InfoQ: Recently it was announced that Microsoft's CLR would start supporting SIMD. Do you see that as being beneficial for ILNumerics?
We – of course – couldn't resist to give the BCL.SIMD namespace an immediate try. The utilization of SIMD vector extensions is the last missing piece for us to completely omit processor optimized native dependencies in ILNumerics. Right now, we still use such native libraries internally (Intel's MKL). We consider BCL.SIMD a very promising approach and have been voting for it since Mono.SIMD.
However, there is still a lot of potential for future improvements on the CLR side. The current implementation has not been able to compete with our (manually tuned) inner function kernels.
We will certainly follow this development, but also continue working on our own SIMD extensions: By focusing on the narrower application domain of ILNumerics, we can profit from more efficient optimization techniques for numerical algorithms than the ones which are applicable for the general purpose target of the CLR JIT compiler. The latter does not only deal with operations on multidimensional arrays (which are fairly easy to parallelize), but also with arbitrary data structures and operations.
ILNumerics is available as a commercial product and as open source under the GPL.