QuantCell Research Announces First Public Beta of their Java-Aware Big-Data Spreadsheet
Big Data analytics startup QuantCell Research has announced the release of the first public beta of what they are positioning as their "Big Data" spreadsheet.
At first blush one might presume that QuantCell is some Java Swing version of yet another spreadsheet program. In actuality it is the latest taxon in the phylogenetic tree of the computer spreadsheet evolution that started with VisiCalc in the late 1970's, and is now dominated by Microsoft Excel, certainly one of the most popular computer programs of all time.
Where prior incarnations of the spreadsheet category were restricted by the rows, columns and functions that were vested into it by the programmer, QuantCell is consummately extensible thanks to its knowledge of Java and JVM languages. Most recently QuantCell has found a niche in big data, providing templates for quickly entering Map and Reduce formulae into its latticework.
At its most basic level QuantCell cells can accept not only the traditional functions generally associated with a spreadsheet; they can also contain instantiations of Java (or Scala or Jython or R) objects.
As a simple example, in pseudo-Java-code, you can say:
a1 <- new String("MM-dd-yyyy") // Store format mask a2 <- new SimpleDateFormat(a1) // Store formatter a3 <- new Date() // Store the date a4 <- a2.format(a3) // Store the formatted string representation of the date.
So that cell a4 references correctly a3 and a2, and indirectly a1.
Figure 1 is an actual screenshot of the example above. (Note the syntax for Java construction without the "new" operator. Also note in the formula bar, the formula for cell a4; the (*) operator refers to the variable representing "this" cell.)
Figure 1. QuantCell screenshot of date example above.
InfoQ interviewed QuantCell co-founders Kris Thorleifsson and Agust Egilsson
InfoQ: You are positioning QuantCell as "the Big Data spreadsheet". Can you explain the role of QuantCell in Big Data analytics?
QuantCell: QuantCell supports Big Data frameworks, Apache Hadoop installations and real time Big Data Analytics. It allows the user to build MapReduce algorithms, use Hive and other JDBC compliant systems and databases for analysis, create real time queries using the relevant SQL or NoSQL syntax and all of this right from the spreadsheet interface. QuantCell then enables the user to submit the analysis to a Hadoop server or other Big Data systems and watch it hack away. QuantCell provides deployment paths which automate many of the Big Data deployment tasks. It automates the delivery of the algorithms and frameworks required to create Big Data tasks or analysis and QuantCell employs code assistance methods and wizards to help create the analytics.
InfoQ: Who is your target user-base?
QuantCell: QuantCell is for users that are domain experts in their field, data scientists, quants, analysts and researchers, as well as consumers of the analysis such as decision makers and managers. QuantCell is also very useful for developers since it enables the developer to rapidly prototype solutions by eliminating user interface design, being expression-based, and producing solutions that are immediately deployable into production.
InfoQ: What does QuantCell do to assist the end-users, who may not be fully conversant programmers?
QuantCell: We strive to improve turnaround times and enable end-user programming by addressing its challenges. Firstly, by extending the QuantCell spreadsheet to powerful programming languages such as R, SQL, Scala, Jython and to Java code snippets. In addition, there are at least five other barriers that we lower considerably for the domain expert, data scientists and non-developer. Namely,
- QuantCell simplifies access to data, access to compute cycles,
- it simplifies and sometimes even eliminates the need to code,
- automates deployment of solutions into production and
- automates access and incorporation of algorithms and methods from external resources.
Of course, lowering these barriers is not easy and we will continue to improve our approach as QuantCell evolves.
InfoQ: You mentioned the data scientist. Can you talk about the support you provide to that user?
QuantCell: In addition to simplifying Big Data analytics and reducing turnaround times for Big Data projects, we listen and strive to make sure that the most commonly used Big Data environments are easily accessible and supported by QuantCell. Deployment is ingrained into QuantCell's DNA and we provide deployment paths that are specific to various Big Data environments. This is a major part of our operation since deployment is complicated and specific to each environment. At the same time it is incredibly valuable for the data scientist to be able to easily deploy big data solutions into different production environments, since it, for example, eliminates the need for rewrites when moving from prototyping to production.
InfoQ: What analytics libraries and tools are available for QuantCell today?
QuantCell: All the Java libraries that have been assigned metadata following the Apache Maven/Ivy standard can be incorporated directly into the user's analysis in QuantCell - usually with just one or two clicks. This is a big deal as it means QuantCell gives users intuitive, on-demand access to thousands of libraries and tools, literally a terabyte of algorithms, right in the spreadsheet UI. For example, the algorithms available in the Maven main search directory, your in-house libraries, open-source development such as Cloudera's Hadoop Distribution, the OpenGamma Platform for financial and risk analysis, Weka for machine learning, BioJava for life science, Bloomberg's Open Market Data, Amazon's Web Services and so on. Among these are various tools for data visualization that deliver their own data processing features and therefore not only enhance the capabilities of QuantCell but also provide visual representation of data and results obtained.
InfoQ: How does one work with large datasets in QuantCell?
QuantCell: QuantCell users can connect to most data sources, both public data providers and private databases using JDBC for SQL or NoSQL access and connect to Hadoop nodes like we talked about earlier. Large data sets reside centrally and are typically too large to move around, so representing these as objects in individual cells in the spreadsheet is ideal and is the approach taken by QuantCell. The user then simply references the cell containing the data object to include it in some analysis or model.
InfoQ: How do you fare in high performance environments?
QuantCell: QuantCell models derive its performance from the Java platform. They are therefore as fast as any other Java code when running locally in the JVM. In particular, the models benefit enormously from the state of the art just-in-time (JIT) compilation and other JVM optimization methods available in the platform. QuantCell therefore brings the Java computational platform, just in time compilation, garbage collection, concurrency and more to the spreadsheet user. For compute intensive tasks, QuantCell models utilize private and public clouds, Hadoop installations, Amazon Web Services and other HPC systems.
InfoQ: Can you give us a preview of what's on the horizon?
QuantCell: Improved support for R, Jython and Scala is in the pipeline. We are also working on improving code recommendations and formula completion methods based on machine learning algorithms. There are also numerous deployment paths that we are working on supporting, just to name a few features.
A little history about the founders; Agust Egilsson, who is architect, lead developer and technology evangelist, holds a Ph.D in mathematics from UC Berkeley with a background as an investment banker/quant and academic. Thorleifsson came from Sun Microsystems and is in product management and marketing.
Egilsson built early prototypes of the system in 2006 and 2007 and used these in his work to build Java-based trading strategies and risk analysis.
After a complete rewrite of the QuantCell client in 2010 and 2011 using JavaFX they released an early version to test users for initial feedback and have continued to enhance the product, until this, the just-released first public beta.
For more information see last year's feature article in Java Magazine and the JavaOne 2012 presentation by QuantCell. You can download the Windows version from the download site. A Mac version is due to be released later this week.