BT

Serialization Optimization Pitfalls

| by Rob Thornton Follow 0 Followers on Nov 14, 2006. Estimated reading time: 1 minute |

In a response to a recent JavaLobby thread, Tom Hawtin looks at optimization of serialization and decides that you shouldn't do it.

The JavaLobby post describes a few different options for optimization and found that by using the Externalizable interface, instead of Serializable, yields significant performance gains (up to 55% in one version of the JDK). Hatwin noted some flaws in the microbenchmark, most significantly:

Serializable objects that implement Externalizable do not have fields included in their class descriptors. Not a lot of people know that. The descriptions are used if there is no writeObject/readObject method and for defaultReadObject and readFields. So, for fairness, the fields for the writeObject/readObject version should be marked as transient.

After running his updated microbenchmark he finds four conclusions:

  • Don't use Externalizable
  • Do reuse streams
  • Implementing writeObject and readObject by hand can improve performance
  • JVMs get better at serialization with each release

In a follow up to his benchmark, Hatwin finds a small optimization that is valuable. He notes that calling ObjectStreamClass.lookup on all classes that you are going to serialize, and assigning the results to static fields, will provide a boost. The boost comes from the fact that the soft reference cache used by serialization will never miss if you hold on to a reference. He has not yet updated his benchmarks to show the impact on it.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

JBoss Serialization by Sacha Labourey

You might want to check "JBoss Serialization" as well (led by Clebert Suconic):
labs.jboss.com/portal/serialization/

From the project page:

We (java developers) have accepted over the years java.io.ObjectInputStream and java.io.ObjectOutputStream being slow when dealing with writeObject operations.
We then started using Externalizable objects as a faster approach for serialization, but even that way was slow when using writeObject operations inside externalizable classes.
Recently we discovered that most of the problems in JavaSerialization are related to static synchronized caching, what causes CPU spikes and also diminishes scaling capabilities.
With JBossSerialization we have done internal benchmarks and we have realized at least 2 times faster serialization with this library. These benchmarks are commited into our CVS repository (as testcases) and they are publicly available.

The main feature in JBossSerialization besides performance, is Smart Cloning
Smart cloning is the capability of the reuse of final fields among different class loaders doing exactly what serialization does, without the need of convert every field into a byteArray.
This approach is at least 10 times faster than using serialization over a byte array.

Using JBossSerialization is very simple:
Instead of instantiating ObjectOutputStream and ObjectInputStream, you would instantiate JBossObjectInputStream and JBossObjectOutputStream (org.jboss.serial.io). You will also have to add jboss-serialization.jar to your classpath.
Everything from the specification is then respected. The only exceptions are:
I - We are using a different protocol as we are focusing in performance (e.g. Smart Cloning ), so that requires the object being serialized and deserialized with JBossSerialization classes.
II - You can serialize classes that are not implementing Serializable, although that's a feature as you can enable that checking.

It works on Java 1.4.2 or JVM 1.5

you need multi-threaded microbench by Bill Burke

If you run a multi-threaded microbench you'll find that the SoftReferenceCache mentioned above becomes a point of contention because it is accessed in a synchronized block. Performance starts to degrade after like 50-100+ threads because of the contention. Replacing this cache with a ConcurrentHashMap (i used the oswego one when I did my benchmarks) removes this contention and give you better scalability.

Bill

Re: you need multi-threaded microbench by Thomas Hawtin

Recent 1.5.0 updates (and 1.6.0) have ditched ye olde SoftCache for a new concurrent-friendly cache. I'm not going to suggest that java.io has respectable performance, but it's an improvement.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

3 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT