Jackson Founder, Tatu Saloranta, responds to JSON Benchmarks
Last week, InfoQ reported that Groovy 2.3 has a much faster JSON parser than previous versions. While creating the article, we sent an email to Tatu Saloranta, founder of the Jackson JSON processor. We wanted to see what he thought about Rick Hightower reporting that Groovy and Boon provide the fastest JSON parser for the JVM.
InfoQ: Do you feel these benchmarks are accurate?
Tatu Saloranta: At a very low level, I think the test methodology is solid. JMH is a good framework, and with proper iteration counts etc., results are repeatable.
I think it is possible that Boon and Groovy are even faster than Jackson for some or many tests, but I do indeed have doubts about most extreme claims, and specifically about cherry-picking particular tests and/or test usage.
My concerns are due to three main things; they all sort of fall under next question.
Also, just to make sure -- the tests I have looked at are available on GitHub. I think there are many derivatives; some of my comments may be less applicable.
InfoQ: Do you think these benchmarks are testing real-world behavior?
TS: Real-world behavior, and real-world usage. I think they may represent a small section of possible usage. I think their emphasis has tended to underline "good cases", to put it bluntly. Three specific concerns I have are:
- Input source. Most commonly cited tests start with Java Strings. Strings are rarely used as input source, because they are JVM constructs -- all external input comes as byte streams. Strings are used in unit tests -- or, if framework (or platform; maybe Groovy does this?), only exposes Strings. Same for writing. This matters mostly because of two things: (a) Jackson heavily optimizes byte-stream case, since it is the bread-and-butter of REST services, or file storage; and (b) Boon has very aggressive optimizations for dealing with Strings; especially use of sun.misc.Unsafe to access and modify underlying char that String class offers to access to. So, use of source that is a minority use case, but where Boon does have a clear edge (it is faster with Strings, there's no denying that), seems suspicious.
- Processing/access style: "untyped" -- process Lists of Maps (instead of POJOs). The second part is less suspicious; but it seems odd to me not to mention that reads and writes only Lists-of-Maps objects and not real POJOs. All modern JVM REST frameworks focus on POJOs, although also allow use of "untyped". Different users have different preferences; so I think it is legitimate to test either, or both, but this should be documented.
- Lazy construction with tests that do not access or verify data. Boon has quite a bit of optimizations geared at lazy processing of input. This can be useful for use cases where only small subset of data is accessed. But the problem here is that performance tests do not do any access of data -- in fact, parser could return any Object, and test would not really notice it. So I feel that tests just happen to work in a way that gives optimal boost for lazy processing; and due to this, they do not represent performance one would get.
Perhaps I should rephrase all of above to say that the tests do not seem to start with actual valid usage patterns -- at best it feels artificial. They only read/write JSON, but make no use of it. I understand that this makes sense from one point of view -- trying not to add overhead of manipulation -- but, unfortunately, due to different trade-offs, it skews results. So when user uses, say, JAX-RS style REST handling, where all JSON data gets bound to a POJO, from an InputStream; and reverse direction goes from another POJO into OutputStream, performance experienced is very different from what a benchmark would suggest.
On the other hand, if the idea is to use "untyped" Objects, at least code should do some form of traversal; and, if same object is to be used for round-tripping, also modifications.
In case of Boon, what happens is that the use of overlays (indexing of raw input, to be able to extract data), along with lazy construction of Maps, hides the actual overhead that would be experienced. And if Strings are used as the source/target, encoding/decoding overhead (which varies between Jackson and Boon -- Jackson targets this step heavily), it further reduces Jackson's end-to-end relative efficiency.
InfoQ: Do you plan on making Jackson faster in the future or is it "fast enough"?
TS: At this point I can address small things, but I do not have major plans to focus on performance. I hope to address some findings (benchmarks have been useful!) to lower overhead when reading from String sources; and Jackson Afterburner module has some of these aggressive optimizations. But these will be incremental improvements most likely.
Performance has not been the number one goal since earliest 1.x releases; and while I do want to keep overhead moderate and low, there are more important things to focus on: ease of use, support for other formats (XML, CSV, CBOR, Smile), conventions, modular data-type handling libs (Joda, Guava) and so forth.
I guess it is fair to say I feel it is close enough to "fast enough", in the right ballpark.
InfoQ: Thanks for your candid responses!
TS: No problem -- Thank you for digging into this. I think Boon for JSON is a useful thing over all; and specifically it is great that Groovy gets modern high-performance support. But I do hope that comparisons are apples to apples, and claims are in line with supporting evidence. :)
Jackson is much more mature and I have used it. Jackson is very solid, and I have a lot of respect for Jackson. Tatu Saloranta is very prolific. It is hard to keep up with everything he is doing. He is awesome.
Jackson is a safe bet. And unless you are never hitting a DB or doing IO, JSON parsing probably wont be an issue for you. I actually said this many times on the blog post. Read the Caveat section.
I do not agree with many things Tatu is saying. The blog shows a small portion of the tests that I have run. Mostly not to bore people (earlier blog posts droned on for pages and pages and pages). There are hundreds of tests thats test many of the use cases that Tatu mentioned in the response. I find that Boon is typically always faster, and there are a few others tests that have come to the same conclusion. It is not always 5x faster. That part is true. Strings are where Boon really shines that is true too. The rest we will have to agree to disagree.
RE:(a) Jackson heavily optimizes byte-stream case, since it is the bread-and-butter of REST services, or file storage; and (b) Boon has very aggressive optimizations for dealing with Strings; especially use of sun.misc.Unsafe to access and modify underlying char that String class offers to access to. So, use of source that is a minority use case, but where Boon does have a clear edge (it is faster with Strings, there's no denying that), seems suspicious.
Boon has a series of parsers not just one. It has a direct ASCII, a direct UTF-8, Reader source parsers, a streaming mode, etc. It does not just have an index overlay. I find Boon direct binary parsing is faster than the Jackson binary parsing and it does not use index overlay (there are two other benchmarks that have come to the same conclusion). The reader parsing is faster than Jackson it seems. (At one point, I had 8 parsers, but I deleted quite a few. It was too hard to maintain them all so I picked the best and fired the rest). It seems that most of the parsers are all faster than Jackson for almost all test cases. I focused on just the index overlay because that is the fastest of the bunch and 5x sounds better than 4x. You can see a sampling of the rest in earlier blog posts. It is not like they are hidden.
I would love to write a whole article on why Index overlay is actually better for REST and Websocket especially if you are using the same objects on multiple use cases. Or you need to run path expression against intermediate forms. There are frameworks that use Boon for these use cases. Lazy is better. Especially for serialization and path expressions. So again.. I have to agree to disagree.
Lazy construction with tests that do not access or verify data. Boon has quite a bit of optimizations geared at lazy processing of input. This can be useful for use cases where only small subset of data is accessed. But the problem here is that performance tests do not do any access of data -- in fact, parser could return any Object, and test would not really notice it. So I feel that tests just happen to work in a way that gives optimal boost for lazy processing; and due to this, they do not represent performance one would get.
The difference between the index overlay (lazy processing) and the non-index overlay is 20% so if Boon is 5x faster with index overlay, it is only four times faster with non-index overlay. I have turned it off and reran the benchmarks. It is on the wiki for on the benchmark site.
Boon is faster at parsing input streams (see the link, run the tests), reading files, byte, etc. than Jackson which you can see if you down the tests and run them.
RE: So when user uses, say, JAX-RS style REST handling, where all JSON data gets bound to a POJO, from an InputStream; and reverse direction goes from another POJO into OutputStream, performance experienced is very different from what a benchmark would suggest.
I have run full object stream serialization with boon vs. many others, and Boon is faster than Jackson for about 80% of the cases, but the results are much closer than the parsing. (Boon actually got slower than older Boon when I added sideways serialization. I might take that out or refactor the mapping so it is optional.) Also it seems Jackson faster than older Jackson.
The unsafe string copy which can be turned off is much more of a boon to speed than index overlay. Index overlay is nice. Whenever you can avoid a buffer copy in a tight loop, things get faster.
RE: TS: At this point I can address small things, but I do not have major plans to focus on performance.
Jackson has got faster since I started publishing benchmarks in December / January of this year. I think he protest too much. :)
In fact, the last few releases of Jackson really seemed to narrow the difference.
RE: In case of Boon, what happens is that the use of overlays (indexing of raw input, to be able to extract data), along with lazy construction of Maps, hides the actual overhead that would be experienced.
20% difference for Strings (less for others). Wrong tree. If you want to pick on a feature pick on the use of unsafe buffer copies, that is 200% but 20% of 5 is 4 and 4 divided by 2 is still 2. :) So worst case 2x faster at parsing has been my experience. :)
I am in crunch mode, but some time next week I will rerun the tests with index overlay off again, make some graphs and show.
Read the link. Look at the FAQ. Make up your own mind. There is a lot of information there. Jackson is more mature. They have different ideas and different features too. :)
Index overlay rocks!
Boon non-index overlay mode using inputstream
So here is a test that uses Boon without index overlay and uses inputstream not String.
Benchmark Mode Thr Count Sec Mean Mean error Units
MainBoonBenchmark.webxml thrpt 16 6 1 455347.925 46637.751 ops/s
BoonClassicEagerNoLazyParse.webxml thrpt 16 6 1 401126.575 28331.138 ops/s
JacksonASTBenchmark.webxml thrpt 16 6 1 233730.506 17868.136 ops/s
MainJacksonObjectBenchmark.webxml thrpt 16 6 1 227287.992 21363.353 ops/s
BoonReaderSource.webxml thrpt 16 6 1 216429.247 22538.238 ops/s
BoonAsciiBenchMark.webxml thrpt 16 6 1 210416.450 10610.062 ops/s
BoonUTF8BenchMark.webxml thrpt 16 6 1 199869.811 8742.968 ops/s
GSONBenchmark.webxml thrpt 16 6 1 168144.639 5311.387 ops/s
MainBoonBenchmark is the index overlay parser it comes in first.
BoonClassicEagerNoLazyParse (the original boon parser) comes in second.
BoonClassicEagerNoLazyParse does not use index overlay so all those comments are just off. You can see for this benchmark index overlay is only about an 11% improvement.
If I include full chop and chop, Jackson will come in fourth.
But then I would have to explain what chop and full chop mean, and then I would have to refer you to the article on InfoQ on index overlay and explain when it makes sense and what use cases you can't use it, and then introduce full chop and chop. :)
Jackson used to come after BoonReaderSource but Tata has been busy so now Jackson is only twice as slow BoonClassicEagerNoLazyParse. So there goes that myth about Index overlay and Boon not doing anything and Boon in only able to handle string and not inputstreams.
LET ME REPEAT THAT: Boon using InputStream, and not using index overlay is faster than Jackson.
GSON comes in last place. I remember when I started GSON would often beat Jackson.
Jackson has improved but even the improved Jackson is twice as slow as Boon for parsing with no index overlay.
RE: Input source. Most commonly cited tests start with Java Strings. Strings are rarely used as input source, because they are JVM constructs -- all external input comes as byte streams.
The benchmark linked which has not changed since this was published and for quite some time covers inputstream, string, byte, etc. It is in the first paragraph. Next time I will use a blink tag. It did not cover not using index overlay because in only rare cases does index overlay not make sense as a viable alternative to full parse and index overlay is actually better for POJO serialization and REST. SO OF COURSE I included it in the benchmark. To think otherwise is just wrong. But there are cases where index overlay does not make sense (thus chop, full chop, and the full parse version), but that is very nuanced discussion, and an argument hard to make when there is some clear FUD. So instead of muddying the water and defending index overlay, let just keep at this, Boon does not need index overlay to beat Jackson at JSON parsing. PERIOD!
. So when user uses, say, JAX-RS style REST handling, where all JSON data gets bound to a POJO, from an InputStream; and reverse direction goes from another POJO into OutputStream, performance experienced is very different from what a benchmark would suggest.
Actually Boon was designed for exactly the REST / Websocket use case. And it was designed for POJO serialization. And you can see in the benchmarks cited that Boon does better than Jackson at POJO serialization in most cases, and in the article by Andrey (Groovy in that case which has some Boon DNA), and in the Gatling Benchmark and its POJO performance was confirmed by Julien Ponge (the author of Golo julien.ponge.org/blog/revisiting-a-json-benchmark/). Boon used to be consistently twice as fast as Jackson at POJO object serialization but Boon added some features which made it slower and Jackson got faster so now Boon usually wins by 30%, but not always. And once you are doing POJO mapping the index overlay vs non-overlay point is completely moot.
Let's go bigger. Webxml, which is from the JSON.org examples is small, let's use a big file.. The catalog which is 170K.
Benchmark Mode Thr Count Sec Mean Mean error Units
i.g.j.inputStream.MainBoonBenchmark.citmCatalog thrpt 16 6 1 1061.592 106.462 ops/s
i.g.j.inputStream.BoonClassicEagerNoLazyParse.citmCatalog thrpt 16 6 1 979.794 51.968 ops/s
i.g.j.inputStream.BoonReaderSource.citmCatalog thrpt 16 6 1 681.072 37.804 ops/s
i.g.j.inputStream.JacksonASTBenchmark.citmCatalog thrpt 16 6 1 554.181 26.974 ops/s
i.g.j.inputStream.GSONBenchmark.citmCatalog thrpt 16 6 1 538.389 43.384 ops/s
i.g.j.inputStream.MainJacksonObjectBenchmark.citmCatalog thrpt 16 6 1 486.275 70.753 ops/s
So again, the claim was that Boon can't handle things fast unless it uses Strings and unless it uses index overlay is just plain wrong.
This is from an inputstream. Jackson is 4th. Boon index overlay beats. Boon full parser, non lazy version beats it. It beats by a wide margin. When Jackson does beat Boon which happens in some uses cases the margin is usually very small. When Boon wins which happens in most of the test cases, it wins by a wide margin with and without index overlay.
Since no one asked me why its Boon is faster or what it does, I wont say other than I have been talking about it on my blog for the last six months, and I'd like to work on an article for InfoQ about it with Jakob Jenkov and/or Stephane Landelle at some point.
Jackson can be faster than Boon. It probably will be. But today. Today. It is not.
Jackson and Boon have different philosophies.
Jackson is more mature. Jackson is more stable. Jackson integrates with more frameworks. Jackson has different and possibly more features. There are use cases where Jackson will be faster. There are many use cases where JSON parser speed will not matter. Jackson probably has less bugs since it has had more eyeballs for more years.
That said... Boon is faster. Today.
Found cases where Jackson is clearly faster
York Xyander, Bodo Junglas Jul 31, 2015
Daniel Bryant, Arian Adair, Michael Hendricks Jul 31, 2015