Bio Robert Greene is Versant Corporation’s VP of Technology and has over 20 years of experience working on high-performance, mission-critical software systems. He provides the technical direction for Versant's database technology, used by 100’s of Fortune 1000 companies such as China Telecom, Dow Jones, Ericsson, G.E. and Verizon.
Software is changing the world; QCon aims to empower software development by facilitating the spread of knowledge and innovation in the enterprise software development community; to achieve this, QCon is organized as a practitioner-driven conference designed for people influencing innovation in their teams: team leads, architects, project managers, engineering directors.
Sure, so I’m Robert Greene, I’m the person who is in charge of the technology, technology direction at Versant Corporation and these days we are working on creating standardized interfaces to our NoSQL database solution.
Sure, sure, that’s a good question. It shares the core, I would say, implementation characteristics of a NoSQL solution, which is in essence a key value store if you look at whether you’re dealing with things like Cassandra or Riak or document stores, Couch or Mongo, those kinds of things, at the end of the day they are all pretty much key value stores. And Versant is similarly architected, has a key which references a value, which in our case is an object, which can be in a number of physical nodes spread around a network of database servers. And I would say the core difference is that we actually abstract underneath the standardized API layers all of the issues dealing with and how it is that the key is referencing complex richly linked sort of information models. So even though it is at its heart a basic key value store if you take a key and grab a value which is a rich object and bring into an application space and when you send a method to it, it causes another object to be referenced in another place, we automatically resolve the reference from the internal key that we manage on your behalf. So, there is a transparency that’s there that normally isn’t seen in most of the NoSQL solutions.
That’s correct. And the Java case it’s JPA, so it’s very much like a Hibernate JPA implementation except there is absolutely no mapping files what so ever, you just worry about the soft schema developing the application models and for the most part, the rest of it is transparent.
Yes, it’s JPA 2.0 compliant. One of the things that we don’t have in there yet is application identity, it’s something that we are targeting because we think it would be useful from a RESTful interface perspective that any string type of URL could reference an object inside the database server. That same principle can be applied for application identity, however, normally we’re used to much much higher performance type of implementations, where something like a RESTful interface tends to be pretty slow, high latency over Internet connections and things like that, not a place that we normally would play.
You can build that basically on top of the database but it’s not native to the database.
Srini´s full question: You mentioned about application development in the previous discussion. What do you think about, there are a lot of application development frameworks for NoSQL data management, whether it’s Spring Data or other frameworks from other vendors, especially they talk about polyglot persistence, you can save the data to multiple databases. So what do you think about where we are with these application development frameworks, whether in Java space or .NET or Ruby, and what do you think is coming up in this particular space?
So, first of all I think the more the NoSQL technologies can embrace and integrate with those frameworks the better, I think we are early in the process. It’s one of the things it’s nice about Versant creating JPA API over this NoSQL database technology is that so many of those frameworks are already designed to work with that. So, Spring Data for example, even though they are trying to come up with a sort of generic interface for NoSQL databases it’s difficult to do because every vendor has a different interface so there is another layer of abstraction which ends up being another layer of performance degradation, and so being able to just plug in natively to their JPA bindings in Spring Data is a nice way to integrate. But just in general, I think that there is a lot of work still to be done in standardization to the extent that it can happen in the NoSQL space. We feel that a lot of the things which we learned in the whole object relational mapping decade, when people where working out how is it that they can more effectively build application modeling over relational databases, a lot of good work was done and manifested itself in POJO persistence in standards like JPA. So we’ve just taken the approach to embrace that enterprise class standard.
Srini´s full question: Right, we should also mention that storing in different databases in the NoSQL space comes with the caveat that each of those database are meant for a particular type of data so that if you are saving from the application it may be transparent but you still want to know what type of the data you are storing, whether in Mongo vs Neo4J vs Versant, different data types. You are also going to be talking about this new product called Electrotank Universe Platform, EUP, can you discuss a little bit more about that and how it helps with NoSQL data management?
So the platform itself doesn’t actually help with the NoSQL data management, it’s a simulation platform that Electrotank creates, which helps people who are building game platforms implement those gaming systems and map out what a quest should look like and eventually build out the infrastructure for that gaming system in a more rapid application development methodology. So folks like Ubisoft and other big game companies would use Electrotank’s EUP platform to build those solutions out. Versant is just, and it’s a NoSQL database, is just the underpinning database to that platform, they were previously using Hibernate to MySQL. It’s a very performance oriented low latency based system, gaming systems tend to be a large number of concurrent users but very low latency. They were having performance issues or having horizontal scaling issues and so they looked at NoSQL technologies and eventually found Versant’s technology to perform better.
It is definitely production ready, they’ve deployed it to a number of studios and it’s been in production for quite some time now, with the old technology stacks and so now with some of the bigger customers with more demanding requirements they’ve made the move to NoSQL technologies and continue to deploy.
9. Let’s go back to the JPA solutions you were talking about earlier. So were there any challenges in implementing this framework, especially NoSQL and Big Data applications which are completely different from relational databases. What were some challenges that you guys worked on?
Sure, I think for us, clearly you can see would be potential challenges for many of the NoSQL classes and solutions to implement that API, because there are certain requirements, for example in the area of queries were JPQL for example, you’d able to specify a statement and have an expected database servers to be able to execute that statement and bring you index query results in a high performance way. It wasn’t an issue for Versant’s technology because we have that server side query capability, much like you find in some of the document stores, but for some of the other NoSQL technologies that would be problematic.
I think the other parts of it were fairly straight forward but, if anything, the transparency of the link being able to kind of take a hybrid approach to this shared nothing versus share everything type of architecture, most of the NoSQL solutions are a shared nothing type of architecture and that includes Versant but there is a unique aspect to Versant that even though if you may query in parallel across lots of physical nodes and get an aggregator results set, that when you talk to the objects inside that result set they can be referencing any physical node in the system. So in some respects it’s a bit of a hybrid from just a pure share nothing implementation. And managing that was probably one of the more challenging aspects of the implementation.
Sure. So the multiple database support is basically a vendor extension to the JPA standard. It’s really pretty straight forward and there is a persistence.xml file which specifies your connection configuration, your connection URL, and you go into that persistence xml configuration and you simply create a column separated list of physical nodes you want to connect to. And when you use that entity manager which is based on that persistence unit, it will connect to all those physical nodes. We’re also coming out with some extensions which will be able to automate and detect the addition of new nodes in the network so that you have more elasticity in a run time system.
Sure, well our implementation will only work with Versant database nodes, it’s unique to our NoSQL database, but I do see that there are other NoSQL technologies which are being supported through other drivers, for example Data Nucleus has JPA abstraction layers into MongoDB as well as some other NoSQL technologies, even Versant has another small sort of embedded database technology called Database for Objects, DB4O, also supported under the DataNucleus framework and then Google, in fact I think that Google’s Big Table is also accessible via JPA, so there is a number of technologies which in the NoSQL space accessible via JPA API. We think that our approach is very unique in that it’s very tightly integrated, it’s not going through very generic frameworks which are trying to deal with mapping to lots of different storage types, so our performance profile, and we do profile that against things like DataNucleus and various data stores, underneath it we find that we have pretty exceptional performance by taking those optimizations.
Sure. So, if you go to Versant’s website you’ll find a link there to our community and this is a shift in Versant’s presence and approach to the developer community in that we want to get more active in getting community contributions and getting more integrations with other ecosystem components, for example Spring, Spring Data, we have started to do implementations with caching solutions, other languages like Scala, Ruby, ETL solutions like Talend, analytic solutions like R, and so basically we are getting people from these various other companies, other technology segments to become aware and contribute in our communities and our guys are in their communities also contributing and building out an ecosystem of support for our NoSQL database technology.
13. […] So what are some emerging trends in NoSQL databases and Big Data management; Big Data is kind of a popular trend now, so what do you see as coming out in these two spaces, NoSQL database management as well as Big Data?
Srini´s full question: Sounds like Versant is getting into other domains of the data management space, which is helpful for both sides. So what are some emerging trends in NoSQL databases and Big Data management; Big Data is kind of a popular trend now, so what do you see as coming out in these two spaces, NoSQL database management as well as Big Data?
So I think that one of the major things that’s happening is that a lot of the NoSQL technologies are moving in the direction that I’ve been talking about, which is more of a transparency of data access across the physical nodes. This link management tends to be a real problem for just basic NoSQL key value stores like Cassandra in that you end up having to write a lot of code in the application layer which frankly is all the code that was written in the early ORM days, you end up writing serialization codes to marshall things from whatever your storage format is, maybe it’s a JSON or an XML or it’s a super column binary structure like you find in something like Cassandra, so you end up having to write serialization codes to marshall that out into your runtime space or marshall it back in, and as soon as you start doing that you end up in situations where you want to know if you’ve already marshalled it once because you don’t want to repeat that because it’s a heavyweight process, so you start to create reference systems to track that, and then you want to know if you have circularity and references between runtime types so you have to start sort of implementing a tracking system for dealing with circular references and all these different layers start to come out, and I think that each of the technology providers in this space you can see them starting to add features to their technology to deal with that, the real true complexity in information models.
And I think that the other thing is that more and more people are dealing with the transparency, the elasticity and the storage of the data. Right now there are still some fairly rudimentary distribution policies which you’ll find in most of the solutions, basic stream hashing and MD5 and other capabilities to take and hash fairly rudimentary values out across a number of physical stores, but as you start seeing people dealing with more richly linked information models you need to have a more intelligent partition capabilities and abstract that away from the end user. So I think that you will see a lot of additions coming out in the next 12 months or so, where people will try to add more transparency to that whole process and allow you to do more interesting things like clustering entire sub-graphs and things like that, simply by partitioning on a root key and things of that nature.
Yes, it becomes really important because the truth is there is a network, so when you are dealing with lots of objects, which if it’s Big Data then it’s lots of values, lots of pieces of data and then you have a sparse data issue, every single call from a key to a value is a network interaction, an RPC (Remote Procedure Call) of some sort and if you’re dealing with thousands of those in a sparse manner that can create a bandwidth issue for you very very quickly. So people need to come up with techniques where you can cluster these things together, document stores and key value stores which have embeddable storage types like super columns for example, allow you to get away from some of those problems by creating these nested types. But the reality is, if you think about it, it’s like a document: you could nest paragraph, sentences in paragraphs, articles etc. into a document but in today’s world documents also have this thing called URL and those URLs reference five or 25 other documents from within one document and those 25 documents reference another 25 documents.
So you can only get the optimization to one level if you’re dealing the nested type underneath the key, so a key to a nested value type. When you start dealing with all these URLs, you start looking at ways where you can cluster multiple things that are natural aggregations together getting those into the same physical servers so that your network traffic and all that stuff can be optimized, it’s the eager/ lazy loading issue that was found in the ORM space, it’s just bringing that back to the NoSQL technology and creating optimizations in implementation.
15. The same problem is back I guess. So you mentioned about data hashing and encryption in the earlier discussion. Can you talk about in general what is happening when it comes to security in the NoSQL space. What are the emerging trends, all the products are there yet?
Sure. I would say that by and large the NoSQL space is very immature when it comes to the security side of things. I think that one of the biggest points that I’ve noticed that a lot of the folks are talking about the road maps and what is it that they are planning to bring out next and it’s mostly logging. I think that people are finding that at least if every interaction that happens with the database can be logged in some way, you know something about the origin of that request, what was the authenticated user that was associated with that request, what was the data they’ve touched, that you can reconstruct what was changed, what happened inside the database. And that’s a fairly rudimentary approach but it’s at least a first level approach to being able to safeguard some of the change management and things that are going on inside with the data in your database. And other folks are talking about looking at role based security and things like that that you would find in relational type systems.
But as you know, each of these layers adds overhead and I think by and large it’s going to come as a “as needed” basis by the demand of the customer bases for the various vendors. Versant tries to take the approach that we create a pluggable sort of interface to the various layers so you can get in to the network layers and get in to the storage layers and you can plug in your own code that deals with things like encrypting things that would go over the wire and get stored on disk. And part of that is and this has to do with the fact that we ship internationally so a lot of countries use our products and when you ship internationally, if you start putting code inside your software product that deals with things like encryption it creates regulatory issues for you. And so a lot of vendors are trying to take that tact, that’s the approach that we take. Outside of that it comes down to basic third party authentications so we have hooks for example you can go out to Kerberos or an LDAP server to authenticate the users and what their roles are, whether they can just read things or they can write things inside the database, things of that nature. But by and large I would say that the NoSQL segment is fairly immature, especially in comparison to the relational technology vendors.
Srini: Relational databases have been around for, I don’t know, 15+ years.
We’ll definitely get there as the demand comes into place.
Sure. So, it is a commercial product, we’re running some very large mission critical systems, if you bought a plane ticket through someone like Travelocity, for example, to come to this conference here today, you went through a Versant system that's running the NoSQL technology, definitely highly scalable and commercial. It really comes in two flavors, part of us having a community and opening up the community is also embracing a different business model which is more in line, I think, with what a modern software company and acquirers of software expect these days and that is sort of an open core so that you can get the base product and the base product will have the standards APIs and it will scale across lots of nodes and it will do all the things we’ve talked about and then there is a pay for version which is an enterprise class version, if you want that then you get some extra tools to help you with monitoring, you get some extra tools to help you with looking at data in the database, high availability options, may be eventually cloud replication capabilities and things like that that aren’t in the core, but the basic, sort of open core product has the core value.