Many of the presumptions of how slow and resource-demanding "Fat” XML is compared to JSON’s lightweight payload do not hold up to a test David Lee, lead engineer at Marklogic, states after running an experiment with 33 different documents and almost 1200 tests on the most commonly used browsers and operating systems. David found the performance for the total user experience, (transfer, parsing and querying of a document), to be nearly identical for both XML and JSON formats.
For his experiment, David has created a publicly available test environment mimicking a use case with XML and JSON documents delivered by a web server and parsed and queried in a web browser. The server provides the client with source data and collecting results submitted by the client. The client is a browser based JavaScript application with the measuring part of the tests in handwritten JavaScript, except for a part that measures jQuery performance.
Besides using seven different documents, with size from 100kB to 1MB, and each document in two JSON and three XML variants, David also tries to cover a range of devices, browsers, operating systems and networks in his test. He has done that by "crowd sourcing", i.e. making the test environment URL public and distributed to a range of mailing lists and social media sites. Up till now almost 1200 distinct and successful tests results has been collected covering the most commonly used browsers and operating systems. In his article David has documented all test data as well as the results for the different tests.
Some of the conclusions David has drawn after the experiment are:
- Parsing speed varies with the technique used. Pure JavaScript parsing generally performs better with XML than with JSON, while query speed generally is faster for JSON. Both with exceptions though where the contrary is true.
- Using the JavaScript library jQuery imposes a steep penalty on JSON and even worse on XML.
- Compressed documents in all formats, even very large representations of JSON or XML, compress to nearly identical size which indicates that they contain approximately the same information content.
- Transferring documents to a wide variety of devices takes effectively the same time per device irrespective of mark-up format.
Based on his experiment David has some suggestions for architects and developers, including:
- Use HTTP Compression which most often is the single most important factor in total performance.
- Optimise the mark-up for transmission and query.
- Do not try to optimise unless the data transmission, parsing and querying is a significant problem comparing to other issues.
Finally David has one piece of advice:
Don’t Trust Anyone.
Don’t believe blindly what you are told. Perform experiments, test your own data and code with your own users and devices. What "seems obvious" is not always true.
Community comments
Missing the point
by Russell Leggett,
Re: Missing the point
by Josef Jelinek,
Re: Missing the point
by Oscar Goldman,
Re: Missing the point
by Russell Leggett,
I wonder
by Oscar Goldman,
Re: Missing the point
by Matthew Rawlings,
Re: Missing the point
by Russell Leggett,
Missing the data point
by Louis P. Santillan,
It should be noted that the core product that MarkLogic produces is based heavily in XML
by Eric Cotter,
Re: It should be noted that the core product that MarkLogic produces is bas
by Russell Leggett,
Wrong
by Oscar Goldman,
I dont want to compare apple and oranges
by James Malvi,
Missing the point
by Russell Leggett,
Your message is awaiting moderation. Thank you for participating in the discussion.
There were a few flaws (IMO) in this benchmark which I will get to in a second, but I just wanted to start by saying that while it demonstrates that under certain conditions XML parsing can be as fast as json, that is not the reason it is widely used in the browser. The reason why json is so great for the browser is because there is no mismatch. Parsing json means you have an actual javascript object with properties and values easily accessible in the way you would want them. XML offers no such convenience - forcing you to go through the almost universally hated DOM api or the (as you demonstrate) slow jquery version. And let's stop for a second and even discuss the concept of querying the result at all. Most use case for json data in the browser involves no querying whatsoever, but simply using the object as it is received. If you wanted to do a real performance comparison for real world use cases, instead of simply walking the resulting data structure, make the target of the workflow be a resulting javascript object. With json, its done as soon as the parse is complete. In the case of XML, it would require walking the XML after parse is complete to marshall it into the javascript object that a developer would actually want to use. Not to mention that a naive translation from XML to javascript object is not going to be as good as json to javascript. You would either have to deal with imperfections in the naive conversion or handroll conversions - ugh.
A couple of other flaws I see in the document:
1. The whole json "full" thing is a load of BS. Anyone serious about using a JSON api is not going to generate the XML first and then generate a naive transformation from that. It just doesn't belong in the data and it is extremely misleading. "Wow, look at that, sometimes the json is a lot bigger than the XML!" Not in the real world. Even if the json isn't carefully hand crafted, there are tools in every language which can convert to json in a way that is much much close to the custom json version vs the "full" version.
2. This paper also seems to have completely missed the fact that there is a native json parser built into all modern browsers (including IE back to version 8). This method is both safe and fast. A quick googling found this test: jsperf.com/json-parse-vs-eval/6 in which the performance was 35% than the eval method you used.
Missing the data point
by Louis P. Santillan,
Your message is awaiting moderation. Thank you for participating in the discussion.
To stand on Russell's post, I agree. Also, the data sets chosen (all but the twitter data set) are not representative of the types of data JSON is used for. It is representative of the types of data XML has traditionally been used for. I'm sure their producers have had years to optimize the structure ("shape").
Re: Missing the point
by Josef Jelinek,
Your message is awaiting moderation. Thank you for participating in the discussion.
Russell's response has - in my opinion - more value than the article/document combined. I wished to find a response like that, for I would need to write one myself otherwise.
It should be noted that the core product that MarkLogic produces is based heavily in XML
by Eric Cotter,
Your message is awaiting moderation. Thank you for participating in the discussion.
It should be read with the knowledge that the company from which this study came from is heavily steeped in XML. The MarkLogic Server product they offer is XML based at its core. So much so they vaunt it as the industry standard. So it is in their best interests to keep XML in the limelight yes?
Just a thought :)
Re: It should be noted that the core product that MarkLogic produces is bas
by Russell Leggett,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yeah, that's not surprising :)
Wrong
by Oscar Goldman,
Your message is awaiting moderation. Thank you for participating in the discussion.
Why use JavaScript to parse XML instead of the browser natively? I use JSON when I'm getting pure data for an application, but XML when I'm going to display the data with some styling. The scenario here doesn't make sense.
Re: Missing the point
by Oscar Goldman,
Your message is awaiting moderation. Thank you for participating in the discussion.
" In the case of XML, it would require walking the XML after parse is complete to marshall it into the javascript object that a developer would actually want to use"
You don't have to walk the XML. You can display it directly, and style it with CSS. That's the problem with this whole comparison.
Re: Missing the point
by Russell Leggett,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yes, I almost addressed this, but my comment was getting very long already. It is true that using XML as you say would not require conversion into a javascript object, however, even in the case you mention, the best and most common solution would be to simply send html and perform an innerHTML assignment (assuming we're still talking about the browser which is the focus of this article). I would certainly accept that in the case where no javascript processing would be desired, json is not likely the best format.
That said, the industry seems to be moving towards client-side templating, for the sake of flexibility, in which case the desired result of the parse would be a javascript object to be processed by a javascript template (unless we're talking XSLT in which case, I'm sorry). I guess what I'm saying is - use the right tool for the right job. I remember way at the beginning when Ajax was AJAX and the X still meant XML. It was painful. The failed E4X language addition was a result of trying to deal with that pain, but it was never supported, and additionally became abundantly clear that JSON got the same results (a simple expressive way of directly interacting with data coming over the wire), but surprise!, it was already supported in every browser.
Re: Missing the point
by Matthew Rawlings,
Your message is awaiting moderation. Thank you for participating in the discussion.
In EAI, a classic anti-pattern is to bind an entire message, making a program brittle to changes that otherwise wouldn't affect the program. I always encourage people to select/query the parts of the message they want, to avoid this tight coupling. This is a much smaller issue for Javascript, but more significant for other languages.
I agree with you that for a Javascript program, in a Web context, it makes sense to use JSON, but there are many other programming languages and contexts in real-world usage.
For me the most interesting point of the paper was the importance of deflating/compressing/gzipping the content on http, relative to the other factors. My experience is that a badly configured httpd.conf is more of a problem more often than wire encoding choice.
Re: Missing the point
by Russell Leggett,
Your message is awaiting moderation. Thank you for participating in the discussion.
In that case you agree with me, because that is what is being tested and contended. From the paper:
When I read the paper, I don't remember seeing any mention of any other clients or even conclusions relating to other environments.
The importance of compression is well known and uncontroversial, I'm glad it struck a chord with you, but it is hardly newsworthy. Perhaps you are trying to start a separate thread about XML vs JSON in all possible use cases? I think that is a much larger topic and will result in a lot more "it depends" conclusions.
I wonder
by Oscar Goldman,
Your message is awaiting moderation. Thank you for participating in the discussion.
If this comment will get repeated thrice or more...
I dont want to compare apple and oranges
by James Malvi,
Your message is awaiting moderation. Thank you for participating in the discussion.
Both are good, and both are used in different uses cases. I would go for JSON when i am developing app with JS and I would go with XML and developing app with server side lang.
these tool may help for JSON and XML lovers.
jsonformatter.org
codebeautify.org/xmlviewer
codebeautify.org/jsonviewer