BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Oracle Tunes Java's Internal String Representation

Oracle Tunes Java's Internal String Representation

Lire ce contenu en français

Bookmarks

 In an ongoing effort to improve Java performance, Oracle has announced a change in the internal representation of strings in the String class as of Java 1.7.0_06.

The change, removing two non-static fields from the underlying String implementation, was done to help prevent memory leaks.

The original String implementation is based on four non-static fields. The first is char[] value, which contains the characters comprising the String. The second is int offset which holds the index of the first character from the value array. The third is int count storing the number of characters to be used. Fourth is int hash, which holds a cached value of the String hash code.

Oracle reported that a performance issue could arise in the original implementation when a String is created using the String.substring() call. Substring() is called internally by many other API calls like Pattern.split(). When String.substring() is called, it refers to the internal char[] value from the original String characters.

The previous implementation was designed that way in order to produce a memory savings, since the substring would still refer to the original character data. In addition String.substring() would run in constant time (O (1)) unlike the new implementation that runs in linear (O(n)) time.

However the old implementation had the possibility of producing a memory leak in cases where an application would extract a small String from an originally large String and then discard the original String. In such a scenario, a live reference to the underlying original large char [] value from the original String is still retained, holding on to possibly many unused bytes of data.

To avoid this situation in earlier versions, Oracle suggests calling the new String(String) constructor on the small String. That API copies only the required section of the underlying char[] thereby unlinking the new smaller String from the original large parent String.

In the new paradigm, the String offset and count fields have been removed, so substrings no longer share the underlying char [] value.

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • is that Java 1.7.0_06 or 1.7.0_60?

    by Adib Saikali,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It seems that 1.7.0_06 might be a typo in the post.

  • Re: is that Java 1.7.0_06 or 1.7.0_60?

    by Robin Rosenberg,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    It's an announcement in the past. 1.7.0_06 includes the change. One would have expected it to be more prominent, e.g. in the release notes, than just a note in a mailing list.

  • sacrifice performance to prevent memory leak???

    by sungkwon eom,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    I'm not sure how much the memory leak of previous version is severe.
    But, at least as for me, if I have to accept low performance for preventing memory leak, I might be hesitated.

  • Oracle incompetence

    by Dennis Sosnoski,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    We've seen for some time that the idiots at Oracle are incompetent to be managing Java. To make a major change to the implementation of the most widely used class in the language (aside from Object) in a point release, with no discussion beforehand, demonstrates that the only thing Oracle can do well is fleece gullible companies out of big bucks for overpriced databases and unneeded middleware.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT