memory usage for substring

Understanding String memory usage

To understand the above calculation, we need to start by looking at the fields on a String object. A String contains the following:

  • a char array— thus a separate object— containing the actual characters;
  • an integer offset into the array at which the string starts;
  • the length of the string;
  • another int for the cached calculation of the hash code.

This means even if the string contains no characters, it will require 4 bytes for the char array reference, plus 3*4=12 bytes for the three int fields, plus 8 bytes of object header. This gives 24 bytes (which is a multiple of 8 so no “padding” bytes are needed so far). Then, the (empty) char array will require a further 12 bytes (arrays have an extra 4 bytes to store their length), plus in this case 4 bytes of padding to bring the memory used by the char array object up to a multiple of 16. So in total, an empty string uses 40 bytes.

If the string contains, say, 17 characters, then the String object itself still requires 24 bytes. But now the char array requires 12 bytes of header plus 17*2=34 bytes for the seventeen chars. Since 12+34=46 isn’t a multiple of 8, we also need to round up to the next multiple of 8 (48). So overall, our 17-character String will use up 48+24 = 72 bytes. As you can see, that’s quite a long way off the 18 bytes that you might have expected if you were used to C programming in the “good old days”1.

Memory usage of substrings

At first glance, you may be wondering why a String object holds an offset and length of the array: why isn’t the string’s content just the whole of the char array? The answer is that when you create a substring of an existing String, the newly created substring is a new String object but which points back to the same char array as the parent (but with different offset and length). Depending on your usage, this is either a good or a bad thing:

  • if you keep on to the parent string after creating the substring, then you will save memory overall;
  • if you throw away the parent string after creating the substring, then you will waste memory (if the substring is shorter than the parent).

For example, in the following code:

String str = "Some longish string...";
str = str.substring(5, 4);

you might have expected the underlying char array of str to end up containing four characters. In fact, it will continue to contain the full sequence Some longish string..., but with the internal offset and length set accordingly. If this memory wastage is a problem (because we are hanging on to lots of strings created in the above manner), then we can create a new string:

String str = "Some longish string...";
str = new String(str.substring(5, 4));

Creating a “brand new” string like this will force the String to take up the “minimum” amount of memory as outlined above by making the underlying char array “just big enough” for the characters of the substring.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s