BT
x Your opinion matters! Please fill in the InfoQ Survey about your reading habits!

Oracle Tunes Java's Internal String Representation

by Kaushik Pal on Dec 23, 2013 |

 In an ongoing effort to improve Java performance, Oracle has announced a change in the internal representation of strings in the String class as of Java 1.7.0_06.

The change, removing two non-static fields from the underlying String implementation, was done to help prevent memory leaks.

The original String implementation is based on four non-static fields. The first is char[] value, which contains the characters comprising the String. The second is int offset which holds the index of the first character from the value array. The third is int count storing the number of characters to be used. Fourth is int hash, which holds a cached value of the String hash code.

Oracle reported that a performance issue could arise in the original implementation when a String is created using the String.substring() call. Substring() is called internally by many other API calls like Pattern.split(). When String.substring() is called, it refers to the internal char[] value from the original String characters.

The previous implementation was designed that way in order to produce a memory savings, since the substring would still refer to the original character data. In addition String.substring() would run in constant time (O (1)) unlike the new implementation that runs in linear (O(n)) time.

However the old implementation had the possibility of producing a memory leak in cases where an application would extract a small String from an originally large String and then discard the original String. In such a scenario, a live reference to the underlying original large char [] value from the original String is still retained, holding on to possibly many unused bytes of data.

To avoid this situation in earlier versions, Oracle suggests calling the new String(String) constructor on the small String. That API copies only the required section of the underlying char[] thereby unlinking the new smaller String from the original large parent String.

In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char [] value.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

is that Java 1.7.0_06 or 1.7.0_60? by Adib Saikali

It seems that 1.7.0_06 might be a typo in the post.

Re: is that Java 1.7.0_06 or 1.7.0_60? by Robin Rosenberg

It's an announcement in the past. 1.7.0_06 includes the change. One would have expected it to be more prominent, e.g. in the release notes, than just a note in a mailing list.

sacrifice performance to prevent memory leak??? by sungkwon eom

I'm not sure how much the memory leak of previous version is severe.
But, at least as for me, if I have to accept low performance for preventing memory leak, I might be hesitated.

Oracle incompetence by Dennis Sosnoski

We've seen for some time that the idiots at Oracle are incompetent to be managing Java. To make a major change to the implementation of the most widely used class in the language (aside from Object) in a point release, with no discussion beforehand, demonstrates that the only thing Oracle can do well is fleece gullible companies out of big bucks for overpriced databases and unneeded middleware.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

4 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT