Kazuho's Weblog: October 2014

Friday, October 10, 2014

[メモ] root権限でrsyncする方法

サーバの移転作業時など、rootしかアクセスできない設定ファイルやアクセス権を保ったままrsyncしたいことってありませんか？

そういった際には、sudo の -A オプションと rsync の --rsync-path オプションを使うと良いようです。

まず、リモートサーバに、パスワードを標準出力に出力するスクリプトファイルを配置します（ファイルのパーミッションを厳しくするのを忘れずに）。

% cat > bin/askpass
#! /bin/sh
echo "{{my_password}}"
% chmod 700 bin/askpass
%

そして、rsync を実行する際には --rsync-path オプションを使い、リモートサーバの rsync を sudo -A 経由で起動するようにすれば良いのです。

% sudo rsync -avz -e ssh \
    --rsync-path='SUDO_ASKPASS=/home/remote-user/bin/askpass sudo -A rsync' \
    remote-user@remote-host:remote-dir local-dir

これは簡単！

sudo -Aオプションはパスワードを記述したファイルを設置する必要があるという点でNOPASSWD同様セキュリティの懸念はありますが、使いどころを誤らなければ便利だなと思いました。

Tuesday, October 7, 2014

Making the HTTP server run 20% faster by using qrintf-gcc; a sprintf-optimizing preprocessor / compiler

tl;dr

This is an initial release announcement of qrintf, a preprocessor (and qrintf-gcc is the compiler frontend) that optimizes calls to sprintf and snprintf. C programs calling the functions may run faster by using the preprocessor / compiler in place of the standard compiler toolchain.

github.com/kazuho/qrintf
Background

sprintf (snprintf) is a great function for converting data to strings. The downside is that it is slow. Recently the functions have been one of the bottlenecks of H2O, which is a high performance HTTP server / library with support for HTTP/1.x and HTTP/2.

The function is slow not because stringification in general is a slow operation. It is slow mainly due to its design.

The function takes a format string (e.g. "my name is %s") and parse it at runtime, every time the function gets called. The design must have been a good choice when the C programming language was designed. Until the 1990s, every byte of memory was a precious resource. Using a memory-conscious approach (e.g. a state-machine for formatting strings) was a logical choice. But nowadays, the cost of memory is hundreds-of-thousand times cheaper than it used to be. On the other hand, the relative cost of running a state machine for building formatted strings has become large since it causes lots of branch mis-predictions that stall the CPU pipeline.

How it works

The understanding has led me to write qrintf, which works as a wrapper against the C preprocessor. It precompiles invocations of sprintf (and snprintf) with a constant format string (which is mostly the case) into optimized forms.

The example below illustrates the conversion performed by the qrintf preprocessor. As can be seen, the call to sprintf is precomplied into a series of function calls specialized for each portion of the format (and the C compiler may inline-expand the specialized calls). In the case of the example, the optimized code runs more than 10 times faster than the original form.

// source code
snprintf(
    buf,
    sizeof(buf),
    "%u.%u.%u.%u",
    sizeof(buf),
    (addr >> 24) & 0xff,
    (addr >> 16) & 0xff,
    (addr >> 8) & 0xff,
    addr & 0xff);

// after preprocessed by qrintf
_qrintf_chk_finalize(
    _qrintf_chk_u(
        _qrintf_chk_c(
            _qrintf_chk_u(
                _qrintf_chk_c(
                    _qrintf_chk_u(
                        _qrintf_chk_c(
                            _qrintf_chk_u(
                                _qrintf_chk_init(buf, sizeof(buf)),
                                (addr >> 24) & 0xff),
                            '.'),
                        (addr >> 16) & 0xff),
                    '.'),
                (addr >> 8) & 0xff),
            '.'),
        addr & 0xff));

$ gcc -Wall -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m2.475s
user 0m2.470s
sys 0m0.002s
$ qrintf-gcc -Wall -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.215s
user 0m0.211s
sys 0m0.002s

Performance impact on H2O

H2O uses sprintf in three parts: building the HTTP status line, building ETag, and building the access log.

By using qrintf-gcc in place of gcc, the performance of H2O jumps up by 20% in a benchmark that sends tiny files and with access logging enabled (from 82,900 reqs/sec. to 99,200 reqs/sec.).

The fact not only makes my happy (as the developer of qrintf and H2O) but also shows that optimizing sprintf at compile-time do have benefit for real world programs.

So if you are interested in using qrintf or H2O, please give it a try. Thank you for reading.

Thursday, October 2, 2014

sprintf を最大10倍以上高速化するプリプロセッサ「qrintf」を作った

最近H2OというHTTPサーバを書いているのですが、プロファイルを取ってみるとsprintfが結構な時間を食っていて不満に感じていました。実際、sprintfは数値や文字列をフォーマットするのに十徳ナイフ的に便利なので、HTTPサーバに限らず良く使われる（そしてCPU時間を消費しがちな）関数です。

では、sprintfを最適化すれば、様々なプログラムがより高速に動作するようになるのではないでしょうか。ということで作ったのが、qrintfです。

qrintfは、Cプリプロセッサのラッパーとしてソースコードに含まれるsprintfの呼出フォーマットを解析し、フォーマットにあわせたコードに書き換えることで、sprintfを高速化します。

たとえば、以下のようなIPv4アドレスを文字列化するコード片を

sprintf(
    buf,
    "%d.%d.%d.%d",
    (addr >> 24) & 0xff,
    (addr >> 16) & 0xff,
    (addr >> 8) & 0xff,
    addr & 0xff);

以下のようにコンパイル時に自動的に書き換え、実行時にフォーマット文字列の解析する負荷を省くことで大幅な高速化を実現しています。

((int)(_qrintf_d(
    _qrintf_c(
        _qrintf_d(
            _qrintf_c(
                _qrintf_d(
                    _qrintf_c(
                        _qrintf_d(
                            _qrintf_init(buf),
                            (addr >> 24) & 0xff),
                        '.'),
                    (addr >> 16) & 0xff),
                '.'),
            (addr >> 8) & 0xff),
        '.'),
    addr & 0xff)
).off);

たとえば上記のコードの場合、以下のように、qrintfを用いることで10倍以上文字列化処理が高速化することを確認しています。

$ gcc -O2 examples/ipv4addr.c
$ time ./a.out 1234567890
result: 73.150.2.210

real    0m2.602s
user    0m2.598s
sys 0m0.003s
$ ./qrintf-gcc -O2 examples/ipv4addr.c
$ time ./a.out 1234567890
result: 73.150.2.210

real    0m0.196s
user    0m0.192s
sys 0m0.003s

sprintfを呼んでいるコードを書き換えずに高速化できて、ハッピーですね！