JasperSoft 4 Released with Big Data Support

JasperSoft this week announced support for reporting against big data systems, including support for a variety of modes for reporting on Hadoop, several popular NoSQL databases, as well as three MPP analytic relational databases. They noted they are supporting:

Hadoop - Jaspersoft supports Hadoop via the Hive SQL interface, reading files via HDFS including the Avro file format, as well as via HBase.
NoSQL - Jaspersoft offers NoSQL support for the following, broadly recognized major categories of data stores: Key Value Stores, Document Databases, BigTable Clones, Graph Databases, and Data Grid Caching among others. Jaspersoft's open source projects for reporting against NoSQL technologies include: Cassandra, MongoDB, Riak, HBase, CouchDB, Neo4J, Infinispan, VoltDB and Redis. A non-open source connector is in beta for VMware GemFire.
MPP Analytic Database - Jaspersoft supports IBM's Netezza MPP analytic database data warehouse and soon will add commercial analysis support. Vertica and EMC Greenplum are also supported.

InfoQ interviewed Andrew Lampitt, Senior Director of Technology Alliances at JasperSoft about the release.

Q: How is this announcement different than what other BI vendors are providing?

A: JasperSoft has always allowed reporting against obscure data formats.

In general, the industry has taken a ho-hum approach to reporting on Hadoop, using Hive to execute SQL queries against Hadoop. JasperSoft has added support for reporting against files in HDFS or directly against HBase, and also against various No SQL flavors..

Q: Have you benchmarked the performance?

A: These are first or second generation connectors, and not meant to be production quality. JasperSoft has collaborated with the vendors, if any, else project owners to produce a first cut of reporting. JasperSoft have talked to some prospects or existing customers, to get 2nd or 3rd level of feedback.

Q: What's the level of adoption or evaluation of these connectors?

A: We either have existing customers using it, or it's new stuff for which we're seeking feedback. Partly this announcement was to generate awareness. We're working with customers and vendors, to learn what are the most demanding corporate reporting requirements

Q: What new capabilities does the release include?

A: The JasperSoft connectors provide

Data Connectivity - allowing you to connect as a custom data source such as for MongoDB or Riak
Custom Query Executor - you can use the query language/syntax expected by various flavors of systems including non sql, hierarchical, etc.

JasperSoft allows bringing the file into memory and manipulating it there. However, analysis against nodes in a graph database [ed: such as Neo4J] is quite different than a key-value store.

Q: Do you support reporting against summaries or star schemas in non-traditional formats?

A: I'm not sure it does. Reporting against an operational system is very different than against a warehouse.

For MongoDB or Riak - you can manipulate data at GUI-level e.g., summarization, but it's not a traditional analysis-situation.

We look at NoSQL as new options for OLTP.

If I'm a developer using Hadoop and want to look at a bit of data, it will let me run some reports against the file system.

Q: When you query a file in Hadoop / HDFS, do you bring the whole file into memory?

A: Yes... (the) limitation is the memory? It's not necessarily loading all this data into the client browser, but it always loads the whole thing on the server (JasperReports Server).

Q: Is there any way to apply a filter or to minimize the size of the data set brought back in files?

A: Anything is possible... but it's not something we'll probably solve soon. It's analogous to issues with a local CSV file. In general you always bring it all into memory. It's not obvious what a good technique for filtering a file will be.

Q: How does JasperSoft report against HBase?

A: HBase just stores a bunch of bytes for a given field. There's no built-in way to know what type of objects these bytes represent. In our POC version of the connector we converted a relational table into HBase. We used the table's primary key field as the ROW_ID. We used the other column names as the FAMILY in HBase. We converted the field value into bytes and used that for the VALUE. And we used the QUALIFIER to put data type information. This allows our connector to know what data type each field is. Others could load data into HBase as we do (the details are available in our HBase loader source code).

Our next step is to implement a pluggable deserialization engine into the connector. This way the connector would know that the bytes that it got out of a given field were really serialized using Java serialization or using Google's Protocol Buffers or some other serialization method. It would allow us to 'discover' the data type of each field. That would make it much simple for someone using JasperSoft iReport (desktop report designer) to build reports.

It's also useful to note that we can connect either directly to HBase or via Thrift. Thrift is an optional piece that is quite commonly used with HBase.

See also the project's download page for connectors.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Enterprise Architecture topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter