Showing posts with label qrintf. Show all posts
Showing posts with label qrintf. Show all posts

Monday, May 11, 2015

Clangに対応し、より高速になったqrintf version 0.9.2をリリースしました

ひさしぶりにqrintf関連の作業を行い、バージョン0.9.2をリリースしました。

ご存知のように、qrintfは、ccachedistccと同様の仕組みでCコンパイラのラッパーとして動作する、sprintf(とsnprintf)の最適化フィルターです。

qrintfを利用することで、整数や文字列をフォーマットするsprintfやsnprintfは最大10倍高速化され、また、H2Oのようなhttpdが20%程度高速化することが知られています。

今回のリリースは、0.9.1以降に行われた以下の改善を含んでいます。


以下の実行例からも、GCCでもClangでもIPv4アドレスの文字列化ベンチマークにおいて、qrintfを利用することで10倍以上の高速化が実現できていることが分かります。
$ qrintf --version
v0.9.2
$ gcc -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m2.512s
user 0m2.506s
sys 0m0.003s
$ qrintf gcc -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.173s
user 0m0.170s
sys 0m0.002s
$ clang -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m2.487s
user 0m2.479s
sys 0m0.004s
$ qrintf clang -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.220s
user 0m0.214s
sys 0m0.002s

今後は、qrintfをH2Oにバンドルすることで、qrintfの恩恵をより多くの利用者に届けて行きたいと考えています。同様のことは、他のソフトウェアプロジェクトでも可能なのではないでしょうか。

それでは、have fun!

Tuesday, November 4, 2014

The internals H2O (or how to write a fast server)

Yesterday, I had the opportunity to speak at the HTTP2Conference held in Tokyo.

Until now, I have rather been silent about H2O since I think it is still premature for wild use. But you do not need to worry about such thing in a technology conference, so I happily talked about the motives behind H2O, and its implementation.

IMO the talk should be interesting not only to people working with HTTP server implementations but also to those interested in programming in general, since it covers the common performance pitfalls (or misconceptions as I see) in many server implementations.

Below are the slides and the recorded video of my presentation.

Enjoy!

PS. Please forgive me for the provocative attitude in the questions raised in the last part of the slides. They were intentionally written as such in order to spark discussion at the hall.



my talk starts a 3h15m (in Japanese)

Related links:

Tuesday, October 7, 2014

Making the HTTP server run 20% faster by using qrintf-gcc; a sprintf-optimizing preprocessor / compiler

tl;dr

This is an initial release announcement of qrintf, a preprocessor (and qrintf-gcc is the compiler frontend) that optimizes calls to sprintf and snprintf. C programs calling the functions may run faster by using the preprocessor / compiler in place of the standard compiler toolchain.

github.com/kazuho/qrintf

Background

sprintf (snprintf) is a great function for converting data to strings. The downside is that it is slow. Recently the functions have been one of the bottlenecks of H2O, which is a high performance HTTP server / library with support for HTTP/1.x and HTTP/2.

The function is slow not because stringification in general is a slow operation. It is slow mainly due to its design.

The function takes a format string (e.g. "my name is %s") and parse it at runtime, every time the function gets called. The design must have been a good choice when the C programming language was designed. Until the 1990s, every byte of memory was a precious resource. Using a memory-conscious approach (e.g. a state-machine for formatting strings) was a logical choice. But nowadays, the cost of memory is hundreds-of-thousand times cheaper than it used to be. On the other hand, the relative cost of running a state machine for building formatted strings has become large since it causes lots of branch mis-predictions that stall the CPU pipeline.

How it works

The understanding has led me to write qrintf, which works as a wrapper against the C preprocessor. It precompiles invocations of sprintf (and snprintf) with a constant format string (which is mostly the case) into optimized forms.

The example below illustrates the conversion performed by the qrintf preprocessor. As can be seen, the call to sprintf is precomplied into a series of function calls specialized for each portion of the format (and the C compiler may inline-expand the specialized calls). In the case of the example, the optimized code runs more than 10 times faster than the original form.
// source code
snprintf(
    buf,
    sizeof(buf),
    "%u.%u.%u.%u",
    sizeof(buf),
    (addr >> 24) & 0xff,
    (addr >> 16) & 0xff,
    (addr >> 8) & 0xff,
    addr & 0xff);

// after preprocessed by qrintf
_qrintf_chk_finalize(
    _qrintf_chk_u(
        _qrintf_chk_c(
            _qrintf_chk_u(
                _qrintf_chk_c(
                    _qrintf_chk_u(
                        _qrintf_chk_c(
                            _qrintf_chk_u(
                                _qrintf_chk_init(buf, sizeof(buf)),
                                (addr >> 24) & 0xff),
                            '.'),
                        (addr >> 16) & 0xff),
                    '.'),
                (addr >> 8) & 0xff),
            '.'),
        addr & 0xff));
$ gcc -Wall -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m2.475s
user 0m2.470s
sys 0m0.002s
$ qrintf-gcc -Wall -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.215s
user 0m0.211s
sys 0m0.002s

Performance impact on H2O

H2O uses sprintf in three parts: building the HTTP status line, building ETag, and building the access log.

By using qrintf-gcc in place of gcc, the performance of H2O jumps up by 20% in a benchmark that sends tiny files and with access logging enabled (from 82,900 reqs/sec. to 99,200 reqs/sec.).

The fact not only makes my happy (as the developer of qrintf and H2O) but also shows that optimizing sprintf at compile-time do have benefit for real world programs.

So if you are interested in using qrintf or H2O, please give it a try. Thank you for reading.

Thursday, October 2, 2014

sprintf を最大10倍以上高速化するプリプロセッサ「qrintf」を作った

最近H2OというHTTPサーバを書いているのですが、プロファイルを取ってみるとsprintfが結構な時間を食っていて不満に感じていました。実際、sprintfは数値や文字列をフォーマットするのに十徳ナイフ的に便利なので、HTTPサーバに限らず良く使われる(そしてCPU時間を消費しがちな)関数です。

では、sprintfを最適化すれば、様々なプログラムが より高速に動作するようになるのではないでしょうか。ということで作ったのが、qrintfです。

qrintfは、Cプリプロセッサのラッパーとしてソースコードに含まれるsprintfの呼出フォーマットを解析し、フォーマットにあわせたコードに書き換えることで、sprintfを高速化します。

たとえば、以下のようなIPv4アドレスを文字列化するコード片を
sprintf(
    buf,
    "%d.%d.%d.%d",
    (addr >> 24) & 0xff,
    (addr >> 16) & 0xff,
    (addr >> 8) & 0xff,
    addr & 0xff);
以下のようにコンパイル時に自動的に書き換え、実行時にフォーマット文字列の解析する負荷を省くことで大幅な高速化を実現しています。
((int)(_qrintf_d(
    _qrintf_c(
        _qrintf_d(
            _qrintf_c(
                _qrintf_d(
                    _qrintf_c(
                        _qrintf_d(
                            _qrintf_init(buf),
                            (addr >> 24) & 0xff),
                        '.'),
                    (addr >> 16) & 0xff),
                '.'),
            (addr >> 8) & 0xff),
        '.'),
    addr & 0xff)
).off);

たとえば上記のコードの場合、以下のように、qrintfを用いることで10倍以上文字列化処理が高速化することを確認しています。
$ gcc -O2 examples/ipv4addr.c
$ time ./a.out 1234567890
result: 73.150.2.210

real    0m2.602s
user    0m2.598s
sys 0m0.003s
$ ./qrintf-gcc -O2 examples/ipv4addr.c
$ time ./a.out 1234567890
result: 73.150.2.210

real    0m0.196s
user    0m0.192s
sys 0m0.003s

sprintfを呼んでいるコードを書き換えずに高速化できて、ハッピーですね!