Kazuho's Weblog: May 2015

Wednesday, May 27, 2015

C言語で「1時間以内に解けなければプログラマ失格となってしまう5つの問題が話題に」の5問目を解いてみた

「Java8で「ソフトウェアエンジニアならば1時間以内に解けなければいけない5つの問題」の5問目を解いてみた」と「Perl6で「ソフトウェアエンジニアならば1時間以内に解けなければいけない5つの問題」の5問目を解いてみた」経由。

以下のような問題ですね。

1,2,…,9の数をこの順序で、”+”、”-“、またはななにもせず結果が100となるあらゆる組合せを出力するプログラムを記述せよ。例えば、1 + 2 + 34 – 5 + 67 – 8 + 9 = 100となる

とてもいい問題だと思うし、一方で上の回答例がeval的な手法を使っていたので、そういうズルをせずに解いたらどうなるだろう、ということでCで書いてみた。

正解が出るようになるまでの所要時間、約30分。なんとかプログラマ合格のようです。

#include <stdio.h>

#define MAX_POS 9
#define EXPECTED 100

static char buf[32];

static void doit(int pos, int sum, char *p, int sign)
{
    int i, n, s;

    *p++ = sign == 1 ? '+' : '-';
    for (i = pos, n = 0; i <= MAX_POS; ++i, n *= 10) {
        *p++ = '0' + i;
        n += i;
        s = sum + sign * n;
        if (i == MAX_POS) {
            if (s == EXPECTED) {
                *p = '\0';
                printf("%s = %d\n", buf + 1, s);
            }
        } else {
            doit(i + 1, s, p, 1);
            doit(i + 1, s, p, -1);
        }
    }
}   

int main(void)
{
    doit(1, 0, buf, 1);
    return 0;
}

Thursday, May 21, 2015

How to properly spawn an external command in C (or not use posix_spawn)

When spawning an external command, as a programmer, you would definitely want to determine if you have succeeded in doing so.

Unfortunately, posix_spawn (and posix_spawnp) does not provide such a feature. To be accurate, there is no guaranteed way to synchronously determine if the function has succeeded in spawning the command synchronously.

In case of Linux, the function returns zero (i.e. success) even if the external command does not exist.

The document suggests that if the function succeeded in spawning the command should be determined asynchronously by checking the exit status of waitpid. But such approach (that waits for the termination of the sub-process) cannot be used if your intension is to spawn a external command that is going to run continuously.

Recently I have faced the issue while working on H2O, and have come up with a solution; a function that spawns an external command that synchronously returns an error if it failed to do so.

What follows is the core logic I implemented. It is fairly simple; it uses the traditional approach of spawning an external command: fork and execvp. And at the same time uses a pipe with FD_CLOEXEC flag set to detect the success of execvp (the pipe gets closed), which is also used for returning errno in case the syscall fails.

pid_t safe_spawnp(const char *cmd, char **argv)
{
    int pipefds[2] = {-1, -1}, errnum;
    pid_t pid;
    ssize_t rret;

    /* create pipe, used for sending error codes */
    if (pipe2(pipefds, O_CLOEXEC) != 0)
        goto Error;

    /* fork */
    if ((pid = fork()) == -1)
        goto Error;

    if (pid == 0) {
        /* in child process */
        execvp(cmd, argv);
        errnum = errno;
        write(pipefds[1], &errnum, sizeof(errnum));
        _exit(127);
    }

    /* parent process */
    close(pipefds[1]);
    pipefds[1] = -1;
    errnum = 0;
    while ((rret = read(pipefds[0], &errnum, sizeof(errnum))) == -1
           && errno == EINTR)
        ;
    if (rret != 0) {
        /* spawn failed */
        while (waitpid(pid, NULL, 0) != pid)
            ;
        pid = -1;
        errno = errnum;
        goto Error;
    }

    /* spawn succeeded */
    close(pipefds[0]);
    return pid;

Error:
    errnum = errno;
    if (pipefds[0] != -1)
        close(pipefds[0]);
    if (pipefds[1] != -1)
        close(pipefds[1]);
    errno = errnum;
    return -1;
}

The actual implementation used in H2O does more; it has a feature to remap the file descriptors so that the caller can communicate with the spawned command via pipes. You can find the implementation here.

I am not sure if this kind of workaround is also needed for other languages, but I am afraid it might be the case.

Anyways I wrote this blogpost as a memo for myself and hopefully others. Happy hacking!

Monday, May 18, 2015

benchart - ベンチマークを記録、表示するプログラムを書いた

速度重要なプログラムを書いていると、継続的にベンチマークを記録し、いつでも参照可能にしておくことは重要。だけど、そのためにExcelを起動するのは面倒だし、だいたい、ベンチマークを測定するためのコマンドを覚えていられないので、benchartというコマンドを作った。

github.com/kazuho/benchart
やってくれることは、以下の３つです。

ベンチマーク結果を保存

ベンチマーク測定に使用したコマンドを保存し、再実行

ベンチマーク結果をグラフにして表示

以下、使用イメージ。

たとえば、qrintfのベンチマークを取ることを考えてみると、examples/ipv4addr.cをコンパイルして実行し、time(1)の値を記録したい。

$ bin/qrintf gcc -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.176s
user 0m0.170s
sys 0m0.003s

こんな感じ。

このコマンドをbenchartに引数として渡してやると、コマンドを実行し、その結果をbenchart.xmlというファイルに保存してくれる。sh -c をつけてるのは、その引数をサブシェルでハンドリングするためだし、timeに-pオプションをつけてるのは、空白区切の単位なしの出力にするため。

benchart record -- sh -c 'bin/qrintf gcc -O2 examples/ipv4addr.c && /usr/bin/time -p ./a.out 1234567890 2>&1'
Following scores were recorded under name: 283a25e.

    real: 0.17
    user: 0.17
    sys: 0.00

If the results look unapropriate, run `/usr/local/bin/benchart pop` to pop the result.

で、次にv0.9.2でもベンチマークを記録したいので、git checkoutして、今度は引数なしでbenchart recordを実行すると、前回と同じコマンドを実行して、ベンチマークをとってくれる。

$ git checkout v0.9.2
$ benchart record
Following scores were recorded under name: v0.9.2.

    real: 0.17
    user: 0.17
    sys: 0.00

If the results look unapropriate, run `/usr/local/bin/benchart pop` to pop the result.

更に前のバージョンをチェックアウトしてベンチマークを取ろうとすると、エラーが出た。

$ git checkout v0.9.1
$ benchart record
re-running benchmark command: sh -c bin/qrintf gcc -O2 examples/ipv4addr.c && /usr/bin/time -p ./a.out 1234567890 2>&1
sh: bin/qrintf: No such file or directory
benchmark script failed with exit status:32512

「そうだ、コマンド名が変わったんだった！」

というわけで、旧形式のコマンドを指定して再実行。

$ benchart record -- sh -c 'bin/qrintf-gcc -O2 examples/ipv4addr.c && /usr/bin/time -p ./a.out 1234567890 2>&1'
Following scores were recorded under name: v0.9.1.

    real: 0.21
    user: 0.20
    sys: 0.00

If the results look unapropriate, run `/usr/local/bin/benchart pop` to pop the result.

ついでに、もう１個古いバージョンも記録。

$ git checkout v0.9.0
$ benchart record
re-running benchmark command: sh -c bin/qrintf-gcc -O2 examples/ipv4addr.c && /usr/bin/time -p ./a.out 1234567890 2>&1
Following scores were recorded under name: v0.9.0.

    real: 0.20
    user: 0.19
    sys: 0.00

If the results look unapropriate, run `/usr/local/bin/benchart pop` to pop the result.

で、測定結果をグラフ表示するには、benchart showコマンドを実行。

$ benchart show

すると、ウェブブラウザでこんな感じでチャートが表示されます。

ベンチマークを記録するのに使ったコマンドはbenchart list-commandsで一覧表示することができ、benchart record --reuse=nameコマンドで、任意の測定コマンドを再実行可能。

自分用にでっちあげたものだけど、これでベンチマークを取る苦痛が減ったらいいなと思ってる。

Thursday, May 14, 2015

jailing - chroot jailを構築・運用するためのスクリプトを書いた

個人サーバで外部に公開するサービスを動かすときには、chrootを使うにこしたことはないわけです。サービス毎にchrootしてあれば、サーバソフトウェアにセキュリティホールがあっても、他の情報が漏洩したりする可能性をぐっとおさえることができるわけですから。

でも、そのためだけにVPSにdockerとかコンテナを入れて使うってのは、構築も運用もめんどくさいし、ディスク容量食うし、やりたくない。systemd-nspawnも割と重たい雰囲気だし、LTSなubuntuだとそもそもsystemd入ってないし…

俺たちがほしいのは、ホストの環境の一部のみにアクセスできる、手軽なjailだー！　ってわけで、ざっくり書いたのが、jailing。

github.com/kazuho/jailing
/usr/bin等、OS由来のディレクトリをchroot環境にread-onlyでエクスポートしつつ、指定されたコマンドを、そのchroot環境で動かすスクリプトです。

/usr/localや/homeといったディレクトリはエクスポートしないので、chroot環境下のソフトウェアにセキュリティホールがあって侵入されたとしても、（カーネルにバグがなければ）chroot環境外の情報が漏洩することはありません。

ホストとchroot環境でディレクトリを共有するためには、--bindオプションを使います。

たとえば、/usr/local/apache下にインストールしたApacheをchroot環境下で動かしたいなって時、jailingを使えば、以下のようにコマンド一発でchroot環境を作成して実行できます。

% sudo jailing --root=/var/httpd-jail \
    --bind /usr/local/apache \
    -- \
    /usr/local/apache/bin/httpd \
    -c /usr/local/apache/conf/httpd.conf

あるいは、/usr/local/h2o下にインストールしたH2Oをchroot環境下で動かす場合は、こんな感じ。

% sudo jailing --root /var/h2o-jail \
    --bind /usr/local/h2o \
    -- \
    /usr/local/h2o/bin/h2o \
    -m daemon \
    -c /usr/local/h2o/etc/h2o.conf

あるいは、jail内に入るには、

% sudo jailing --root /var/h2o-jail \
    -- \
    bash

とかやればいいわけです。

簡単ですね！

詳しくはman jailingしたりしてください。それでは〜

Monday, May 11, 2015

Clangに対応し、より高速になったqrintf version 0.9.2をリリースしました

ひさしぶりにqrintf関連の作業を行い、バージョン0.9.2をリリースしました。

ご存知のように、qrintfは、ccacheやdistccと同様の仕組みでCコンパイラのラッパーとして動作する、sprintf(とsnprintf)の最適化フィルターです。

qrintfを利用することで、整数や文字列をフォーマットするsprintfやsnprintfは最大10倍高速化され、また、H2Oのようなhttpdが20%程度高速化することが知られています。

今回のリリースは、0.9.1以降に行われた以下の改善を含んでいます。

数値の変換速度の改善（@imasahiro氏による）#9 #10
Clangへの暫定対応と、それにともなうコマンドラインインターフェイスの変更 #13 #16
-DQRINTF_NO_AUTO_INCLUDEオプションによる、自動#includeの抑止 #14
%.*sへの対応 #7

以下の実行例からも、GCCでもClangでもIPv4アドレスの文字列化ベンチマークにおいて、qrintfを利用することで10倍以上の高速化が実現できていることが分かります。

$ qrintf --version
v0.9.2
$ gcc -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m2.512s
user 0m2.506s
sys 0m0.003s
$ qrintf gcc -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.173s
user 0m0.170s
sys 0m0.002s
$ clang -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m2.487s
user 0m2.479s
sys 0m0.004s
$ qrintf clang -O2 examples/ipv4addr.c && time ./a.out 1234567890
result: 73.150.2.210

real 0m0.220s
user 0m0.214s
sys 0m0.002s

今後は、qrintfをH2Oにバンドルすることで、qrintfの恩恵をより多くの利用者に届けて行きたいと考えています。同様のことは、他のソフトウェアプロジェクトでも可能なのではないでしょうか。

それでは、have fun！