Wednesday, September 30, 2009

String.format vs. MessageFormat.format vs. String.+ vs. StringBuilder.append

Usually I'm not a performance hacker, i.e. I seldom use the final keyword, and I dare to use List.size() in loop expressions. But I'm a little bit sensitive when it comes to string operations, as they can cause severe performance problems, and sometimes security issues as well, e.g., if an SQL query is to be created (ok, bad example, one can use PerparedStatement instead of hacking an SQL string). For logging I often have to concatenate strings, such as
if (log.isLoggable(Level.INFO)) {
    log.info("Command created - cmd=" + cmd); //$NON-NLS-1$
}
Actually I'm not writing lines like this myself, I let log4e do the job -- I really love that tool! But sometimes these string concatenations are hard to read, using a String.format or MessageFormat.format would make it much easier. Here are some examples:
foo(String.format("Iteration %d, result '%s'", i, value[i%2]));

foo(MessageFormat.format("Iteration {0}, result ''{1}''", i, value[i%2]));

// single line:
foo( "Iteration " + i + ", result '" + value[i%2] + "'");

// multi line:
String s;
s = "Iteration ";
foo(s); s+= i; foo(s); s+= ",\nresult '";
foo(s); s+= largeString(i); foo(s); s+= "', '"; 
foo(s); s+= largeString(i+1); foo(s); s+= "'";

foo(new StringBuilder("Iteration '")
    .append(i).append(", result '").append(value[i%2]).append("'")
    .toString());
(Update: the multi line test was added after daObi and Laurent commented the first version) Hmm... maybe the "+" isn't that bad ;-) I do not understand why String.format andMessageFormat.format have these different notations, and frankly I seldom get the format string right in the first run. So, besides the aesthetics, what about performance. What would you guess? My guess was StringBuilder would be the fastest, followed by the format methods, and the String operator "+" would be the slowest as new Strings are to be created all along. I run a little test (calling these methods 100000 times, foo() is doing nothing and was, besides other things, only added in order to avoid too much compiler optimizations, although I have no idea if that worked ;-) ). Note that in the test only a few strings are concatenated, but I think that's a typical example and it was the setting I was thinking about to use format methods instead of string concatenation (and it's quite similar to many toString methods). I was surprised by the result:
String.format:757 ms
MessageFormat.format:1078 ms
String +:61 ms
String + (multi line):162 ms
StringBuilder:74 ms
Gosh! The "+" operator clearly is the winner! It is 10 (ten!) times faster then the format methods. 10 times, can you believe that. I was really surprised, and frankly, I couldn't believe it. (Update: daObi and Laurent explain it, see their comments below). Thus I changed the test, assuming the compiler tricked me, but I always got more or less the same results. I figured it may be the size of the strings, so I also changed that. That is, I used large strings (with 70.000 characters, the method returns one of two strings in order to avoid compiler optimization) in the concatenation:
for (int i = 0; i < n; i++) {
    s = MessageFormat.format("Iteration {0,number,#},\nresult ''{1}'', ''{2}''", i,
   largeString(i), largeString(i+1));
}

for (int i = 0; i < n; i++) {
 s =  "Iteration " + i + ",\nresult '" + largeString(i) + "', '" + largeString(i+1) + "'";
}
Actually, the test is a little bit unfair, as in all cases at least the result string has to be created (i.e. also the format methods have to create strings), but at least the string is rather long. The result string s has a length of 140027 characters, so this is a pretty long string, isnt't? Here are the result of the long-string-test:
String.format:584 ms
MessageFormat.format:816 ms
String +:621 ms
String + (multi line):867 ms
StringBuilder:555 ms
Yeah! Now we have the order I expected in the first place: StringBuffer.append is the winner, followed by String.format. MessageFormat.format has a little problem and is even slower than the String concatenation. But compared to the result of the first test, all methods are at the same speed. So, what did I learn? First, I will use the plain and simple string concatenation in the future more often, as it is easier to write in many cases compared to some weird format strings. I will use String.format only if I probably need to translate my code, since this is easier to translate a format string than changing the code (if the order of the sentence components change in another language). And, of course, I will use StringBuilder if large strings have to be concatenated, but I think I use it less in toString-methods. And maybe I'm going to read a book about performance hacking -- do you have any recommendations? Anyway, this test also teaches me (again) that good performance is not (only) a question of single statements, but a question of good algorithms and good design in general. Note: Actually this is not the first time I was surprised by the result of a preformance test. I did a test once comparing the performance of different class library designs in case of a math library for matrices and vectors. I have documented the test in the LWJGL forum, and its results influenced the design of the GEF3D geometry package (see package documentation).

14 comments:

Manuel Woelker said...

Interesting results. For logging in particular, I found slf4j to offer a good solution to this problem, e.g.

logger.debug("Temperature set to {}. Old temperature was {}.", t, oldT);

This style omits the StringFormat call, and supports formatting right out of the box. The other nice thing about it is that the string isn't created if the message isn't logged.This makes the if clause checking isLoggable() also obsolete.

Thomas Hallgren said...

I bet you could make the StringBuilder outperform the + operator by reusing the StringBuilder instance and just call its clear() method at the beginning of each iteration.

Jens v.P. said...

@Manual:
Looks nice, is slf4j in the Eclipse orbit, i.e. is it IP approved (the crucial question when it comes to Eclipse projects)? I used jakarta commons logging in the past, but then I switched to JDK logging, as it comes out of the box (I do not have to use another jar) and it is well supported by log4e.

@Thomas:
That certainly is true, Thomas. On the other hand, the loop is only there for the test, in a "real" application, e.g., a toString()-method, we usually have to create a new StringBuffer, unless we cache its instance. If the StringBuffer instance is cached in a class in order to speed up the toString()-method, it will probably slow down the whole application, as we need more memory for all that cached StringBuffers (I'm exaggerating here, I know ;-) ). Of course, we could use a StringBuffer-pool, but then we get synchronization problems and that stuff, and maybe this is too much overhead for a simple toString() method. As a matter of fact, we are using a pool for matrices and vectors in GEF3D's geometry package, in order to avoid using static members as found in GEF. It's is working pretty well, but this is another topic (and maybe subject of a future blog entry ;-) ).

daObi said...

"String +" and StringBuilder are equivalent.
E.g. JDK 1.5 compiler translates

"Iteration " + i + ", result '" + value[i%2] + "'"

to

new StringBuilder("Iteration ").append(i).append(", result '").append(value[i%2]).append("'").toString()!

Laurent Goubet said...

Even though you did try to avoid compiler optimisation, the "+" you use here is equvalent to a StringBuilder in Java 5 and above as it is on a single line.

Usually in a Java program, what we use if more like (Yes, I know this is exagerated, we would inline some of these ;)) :

String s = "Iteration ";
s += i;
s += ", result '"
s += value[i%2];
s += "'";

A more accurate test you could do is something like this :

String s = "";
for (int i = 0; i < 1000000; i++)
s = s + value[i%2];

against

StringBuilder buffer = new StringBuilder();
for (int i = 0; i < 1000000; i++)
buffer.append(value[i%2]);

which is more like "every day use" and which indeed shows the problem with Strings : the first one would be inlined by the compiler so that it is equivalent to :

String s = "";
for (int i = 0; i < 1000000; i++)
s = new StringBuilder(s).append(value[i%2]).toString();

how much of a performance bottleneck would that be? :p

Jens v.P. said...

Oh, thanks for that information, daObi and Laurent. I didn't knew that, but as I wrote I'm not a performance freak and no VM specialist ;-)

And of course, if a string has to be created as in your example, Laurent,I will always use a StringBuilder. In my "test" I focused on typical "toString()" methods and what I'd need there.

I have added a String-concatenation-multi-line test, in order to demonstrate how this performs (in a toString()-like case)

Thomas Hallgren said...

@Jens:
My point was that in general you use StringBuilder because you can optimize things like loops where the buffers used for building the string can be reused. You can't do that when you use the '+' operator. Instead you'll have to rely on that the compiler does that for you. So StringBuilders give you more control, albeit to the cost of good readability.

MessageFormat and String.format become very useful when you add NLS support to your app. It's not uncommon that the parameters to the format string needs a different order in different languages. That's hard to accomplish without a proper formatter.

R.J. said...

Hi Jens,

One other comment - the loop example you have added is still being optimized away by the compiler it looks like. If you look at Laurent's example, the big difference between his and yours is that he is modifying a string outside of the loop, where as your example simply builds the string inside the loop. Here is a loop that shows where the compiler can fall down.

String s = "";
for(int i=0; i<15000; i++) {
s += "Hello";
}

Here is the corresponding StringBuffer version:

StringBuffer s = new StringBuffer();
for(int i=0; i<15000; i++) {
s.append("Hello");
}

The StringBuffer version on my current machine took less than 20 milliseconds even if I moved the number up to 150,000. The concatenation version takes over 2 seconds, and churns GOBs of memory.

daObi said...

If you want to know what compiler does then you could use Bytecode Outline plugin.
http://andrei.gmxhome.de/bytecode/index.html
You need a little effort to understand the VM instructions but the "Bytecode Reference" view documents it quite good also with examples.

Eric Rizzo said...

Maybe I'm focusing on the wrong thing, but I cringe whenever I see logging code that is wrapped in an if statement that checks the logging level. What that does is take a line of logging code, which is noise that we want to ignore 99% of the time, and turns it into 3 lines. It's ugly, ugly , UGLY.
The slf4j approach is much better, and also pretty much eliminates the need to worry about the performance of the string creation since it only happens when you really need it, and you really need it infrequently if you're logging sensibly (DEBUG and INFO log levels should not usually be enabled in production and WARN/ERROR shouldn't be frequent enough to impact performance).
So by using the slf4j approach, this whole topic becomes mostly moot.

Oh, and in case you didn't realize it by now, I find micro benchmarks like this to be of dubious value, especially with the intelligence of the modern compilers and JVMs. Micro benchmarking is difficult and dangerous (in the sense that it can eat up large amounts of time and still produce deceiving results); much better use of time and money is to benchmark real function points that are measured to be a bottleneck.
Furthermore, some (many?) performance experts have said that garbage collection often outweighs application code in terms of performance problems anyway.

Anonymous said...

How about the memory between them?

Benjamin Carlyle said...

I think you'll find the difference between String.format and MessageFormat to be related to internationalisation concerns. Often it is necessary to reorder terms within a message to localise for a different language. While String.format appears to be derived from the C "printf" function the {0}, {1}, etc in MessageFormat are a little more polymorphic and allow for any reordering that may be required.

Thomas Hallgren said...

String.format() allows 'argument_index' to be specified and is thus capable of reordering:

"Second = %2$s, First = %1$s"

will reverse the order of two string arguments.

David Liszewski said...

I made a test using the Caliper micro-benchmarking framework from Google and found StringBuilder and concatenation operator to be similar, and the both formatters to be at least an order of magnitude slower.

See http://unhillbilly.blogspot.com/2012/12/another-micro-benchmark-formatting.html