Kazuho's Weblog: Making the HTTP server run 20% faster by using qrintf-gcc; a sprintf-optimizing preprocessor / compiler

Tuesday, October 7, 2014

Making the HTTP server run 20% faster by using qrintf-gcc; a sprintf-optimizing preprocessor / compiler

tl;dr

This is an initial release announcement of qrintf, a preprocessor (and qrintf-gcc is the compiler frontend) that optimizes calls to sprintf and snprintf. C programs calling the functions may run faster by using the preprocessor / compiler in place of the standard compiler toolchain.

github.com/kazuho/qrintf
Background

sprintf (snprintf) is a great function for converting data to strings. The downside is that it is slow. Recently the functions have been one of the bottlenecks of H2O, which is a high performance HTTP server / library with support for HTTP/1.x and HTTP/2.

The function is slow not because stringification in general is a slow operation. It is slow mainly due to its design.

The function takes a format string (e.g. "my name is %s") and parse it at runtime, every time the function gets called. The design must have been a good choice when the C programming language was designed. Until the 1990s, every byte of memory was a precious resource. Using a memory-conscious approach (e.g. a state-machine for formatting strings) was a logical choice. But nowadays, the cost of memory is hundreds-of-thousand times cheaper than it used to be. On the other hand, the relative cost of running a state machine for building formatted strings has become large since it causes lots of branch mis-predictions that stall the CPU pipeline.

How it works

The understanding has led me to write qrintf, which works as a wrapper against the C preprocessor. It precompiles invocations of sprintf (and snprintf) with a constant format string (which is mostly the case) into optimized forms.

The example below illustrates the conversion performed by the qrintf preprocessor. As can be seen, the call to sprintf is precomplied into a series of function calls specialized for each portion of the format (and the C compiler may inline-expand the specialized calls). In the case of the example, the optimized code runs more than 10 times faster than the original form.

// source code
snprintf(
    buf,
    sizeof(buf),
    "%u.%u.%u.%u",
    sizeof(buf),
    (addr >> 24) & 0xff,
    (addr >> 16) & 0xff,
    (addr >> 8) & 0xff,
    addr & 0xff);

// after preprocessed by qrintf
_qrintf_chk_finalize(
    _qrintf_chk_u(
        _qrintf_chk_c(
            _qrintf_chk_u(
                _qrintf_chk_c(
                    _qrintf_chk_u(
                        _qrintf_chk_c(
                            _qrintf_chk_u(
                                _qrintf_chk_init(buf, sizeof(buf)),
                                (addr >> 24) & 0xff),
                            '.'),
                        (addr >> 16) & 0xff),
                    '.'),
                (addr >> 8) & 0xff),
            '.'),
        addr & 0xff));

$ gcc -Wall -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m2.475s
user 0m2.470s
sys 0m0.002s
$ qrintf-gcc -Wall -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.215s
user 0m0.211s
sys 0m0.002s

Performance impact on H2O

H2O uses sprintf in three parts: building the HTTP status line, building ETag, and building the access log.

By using qrintf-gcc in place of gcc, the performance of H2O jumps up by 20% in a benchmark that sends tiny files and with access logging enabled (from 82,900 reqs/sec. to 99,200 reqs/sec.).

The fact not only makes my happy (as the developer of qrintf and H2O) but also shows that optimizing sprintf at compile-time do have benefit for real world programs.

So if you are interested in using qrintf or H2O, please give it a try. Thank you for reading.