Nothing Is Permanent Except Change - How Software Architects Can Embrace Change
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by Srini Penchikala and Roberto Zicari on Mar 21, 2011
OO7J is a Java version of the original OO7 benchmark (written in C++) from Mike Carey, David DeWitt and Jeff Naughton at the University of Wisconsin-Madison. The original benchmark tested Object Databases (ODBMS) performance. This project also includes benchmarking Object Relational Mapping (ORM) tools. Currently there are implementations for Hibernate on PostgreSQL, MySQL, db4o and Versant databases.
The source code is available on sourceforge site under the GNU GPL license. InfoQ and Roberto V. Zicari from ODBMS.ORG recently interviewed Pieter van Zyl, creator of the OO7J benchmark.
InfoQ & R. Zicari: Please give us a summary of OO7J research project.
Pieter van Zyl: The study investigated and focused on the performance of object persistence and compared ORM tools to object databases. ORM tools provide an extra layer between the business logic layer and the data layer. This study began with the hypothesis that this extra layer and mapping that happens at that point, slows down the performance of object persistence. The aim was to investigate the influence of this extra layer against the use of object databases that remove the need for this extra mapping layer. The study also investigated the impact of certain optimisation techniques on performance.
A benchmark was used to compare ORM tools to object databases. The benchmark provided criteria that were used to compare them with each other. The particular benchmark chosen for this study was OO7, widely used to comprehensively test object persistence performance. Part of the study was to investigate the OO7 benchmark in greater detail to get a clearer understanding of the OO7 benchmark code and inside workings thereof.
Because of its general popularity, reflected by the fact that most of the large persistence providers provide persistence for Java objects, it was decided to use Java objects and focus on Java persistence. A consequence of this decision is that the OO7 benchmark, currently available in C++, has had to be re-implemented in Java as part of this study.
Included in this study was a comparison of the performance of an open source object database, db4o, against a proprietary object database, Versant. These representatives of object databases were compared against one another as well as against Hibernate, a popular open source representative of the ORM stable. It is important to note that these applications were initially used in their default modes (out of the box). Later some optimisation techniques were incorporated into the study, based on feedback obtained from the application developers. My dissertation can be found here.
InfoQ & R. Zicari: Please give us a summary of the recommendations of the research project.
Pieter: The study found that:
When creating and using benchmarks it is important to clearly state what settings and environment is being used. In this study it was found that:
This work formed part of my MSc. While the findings are not always surprising or new, the work showed that you could use the OO7 benchmark still to test today's persistence frameworks. It really brought out performance differences between ORM tools and object databases. This work is also the first OO7 implementation that tested ORM tools and compared open source against commercial object databases.
InfoQ & R. Zicari: What is the current state of the project?
Pieter: The project has implementations for db4o, hibernate with PostgreSQL and MySQL and the Versant database. The project currently works with settings files and Ant script to run different configurations. The project is a complete implementation of the original OO7 C++ implementation. More implementations will be added in the future. I also believe that all results must be audited. I will keep submitting benchmark results to vendors.
InfoQ & R. Zicari: What are the best practices and lessons learned in the research project?
Pieter: What is interesting today is that bench-markers are still not allowed to publish benchmark result of commercial products. Their licensees prohibit it. We felt that academics must be allowed to investigate and publish their results freely. In the end we did comply with the licenses and submitted the work to the vendors.
InfoQ & R. Zicari: Do you see a chance that your benchmark will be used by the industry? Why?
Pieter: Yes, but I suspect they are using benchmarks already. These benchmarks are probably home grown. Also there are no de-facto benchmarks for object database and ORM tool vendors. There exists a TPC benchmark for relational database vendors. While some vendors did use the OO7 benchmark in the late 90s they seem to not use it any more or maybe they have adjusted for in-house use.
OO7J could be used to test improvements from one version to the next. I have used it to benchmark differences between different db4o releases. We use tested embedded versions of db4o with the client-server version of db4o and this gave us valuable information and we could discern the differences in performance.
Currently OO7J has its own interface to the persistence store being benchmark. This means that it can be extended to test most persistence tools. We wanted to use the JPA or JDO interfaces but not all vendors support these standards.
InfoQ & R. Zicari: What is the feedback did you receive so far?
Pieter: The dissertation was well received. I got a distinction for the work. I submitted the benchmark to the vendors to get their input on the benchmark and how to optimize their products. The feedback was good and no bugs were found. It is important that a benchmark is accurate and used consistently for all vendor implementations. I don't think there are any funnies or inconsistencies in the benchmark code.
Jeffrey C. Mogul states that it is important that benchmarks should be repeatable, relevant, use realistic metrics, be comparable and widely used. I think OO7 complies with those requirements and I stayed as close as possible to OO7 with OO7J.
Also OO7J has been used by students at ETH Zurich - Department of Computer Science. Another object database vendor in America also contacted me about my work and wanted to use it for their benchmarking. Not sure how far they progressed.
InfoQ & R. Zicari: What are the main related works? How does OO7J research project compare with other persistence benchmarking approaches and what are the limitations of the OO7J project?
Pieter: There have been related attempts to create a Java implementation of OO7 in the late 90s by a few researchers. Sun also created a Java version. These versions are not available any-more and weren't open sourced. See my dissertation for more details.
More recent work includes:
Other benchmarking work in the Java object space:
These benchmarks are not entirely vendor independent. But they are open source and one can look at the code and challenge their coding.
I think OO7 has one thing going for it that the others don't have: I still think it is more widely used. Especially in the academic world. It has a lot of vendor independence behind it historically. It has had more reviews and documentation on how it works internally.
But I have seen some implementation of OO7 that are not complete: they for example build half the model and then don't disclose these changes when publishing the results. Or only have some of the queries of traversals working.
That is why I like to stay close to the original well known OO7. I document any changes clearly.
If you run Query 8 of OO7 I want to expect that it functions 100% like the original. If anyone modifies it they should see this as an extension and rename the operation.
I have also included asserts/checkpoints to make sure the correct number of objects are returned for every operation.
Limitations of OO7J:
InfoQ & R. Zicari: What still needs to be done?
Pieter:
InfoQ & R. Zicari: NoSQL/NRDBMS solutions are getting lot of attention these days. Are there any plans to do a persistence performance comparison of NoSQL persistence frameworks in the future?
Pieter: Yes, they will be incorporated. I still believe object databases are well suited to this environment. Still not sure that people are using them in the correct situations. I sometimes suspect people jump on to a hot technology without really benchmarking or understanding their application needs.
InfoQ & R. Zicari: What is the future road map of your research?
Pieter: Investigate clustering, caches, MapReduce, column-oriented databases and investigate how to incorporate these into my benchmarking effort.
I would also love to get more implementation experience either with a vendor or building my own database.
Final note to the interview:
"Too often I've seen designs used or rejected because of performance considerations, which turn out to be bogus once somebody actually does some measurements on the real setup used for the application"
- Martin Fowler, Patterns of Enterprise Application Architecture
I believe that one should benchmark before making any technology decisions. People have a lot of opinions of what performs better but there is usually not enough proof. There is a lot of noise in the market. Cut through it and benchmark and investigate for yourself.
Pieter van Zyl is a researcher at the Meraka Institute of South Africa's Council for Scientific and Industrial Research (CSIR). His research focuses on object persistence mechanisms (ORM tools and object databases), with a specific focus on performance benchmarks. He is part of the Espresso research group at the University of Pretoria and a maintainer of the open source performance benchmark project PolePosition on Sourceforge.
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
Alex Russell talks about the shortcomings of the web platform and how it is evolving in order to adress them. He also explains about how browsers are improving and shares his vision on things to come.
Jeff Lindsay discusses creating distributed and concurrent systems using ZeroMQ – a lightweight message queue-, and gevent – a coroutine-based networking library.
Brian Ketelsen introduces Skynet, a platform for polyglot, distributed and composable services that communicate with each other over RPC/JSON.
Carin Meier tells the story of Alice discovering Monads, meeting three types of monads – Identity, Maybe, State-, and learning how to implement them in Clojure.
The need for agile, queryable, reliable, scalable storage without the pain of SQL schema migration is real. This article uses MongoDB to introduce NoSQL concepts to Java, PHP, and Python developers.
Jérôme Giraud introduces Wink Toolkit, an open source mobile JavaScript framework for HTML5 web or hybrid apps, showing widgets and interactions.
No comments
Watch Thread Reply