Monday, September 8, 2014

The reasons I stopped using libuv for H2O

Libuv is a great cross-platform library that abstracts various types of I/O by using callbacks.

So when I started writing H2O - a high-performance HTTP server / library implementation with support for HTTP1, HTTP2 and websocket, using libuv seemed like a good idea. But recently, I have stopped using it for sereval reasons. This blog post explains them.

■No Support for TLS

Although libuv provides an unified interface for various types of streams, it does not provide access to a TLS stream though the interface, nor does provide a way for application developers to implement their own types of streams.

The fact has been a disappointment to me. Initially I had created a tiny library called uvwslay that binds libuv streams to wslay - a websocket protocol implementation. But since libuv does not provide support for TLS, it was impossible to support SSL within the tiny library.

To support wss protocol (i.e. websocket over TLS) I needed to reimplement the websocket binding, to stop using libuv as its stream abstraction layer and switch to an abstraction layer defined atop of libuv that abstracts the differences between libuv streams and the I/O API provided by the SSL library being used (in my case it is OpenSSL).

It would be great if libuv could provide direct support for TLS in future (or provide a way to implement their own type of libuv streams), so that code directly calling the libuv API can support various types of streams.

■Memory Usage is not Optimal

Libuv provides a callback-based API for writes and I love the idea. It simplifies the complicatedness of implementing non-blocking I/O operations. However the problem of libuv is that it does not call the completion callback right after write succeeds. Instead, it is only until all I/Os are performed that the callbacks are called. This means that if you have 1,000 connections sending 64KB of data, your code first allocates 64KB of memory 1000 times (64MB in total) and then call free for all of those memory chunks, even when the network operation does not block. IMO, libuv should better call the completion callback immediately after the application returns the control back to the library after calling uv_write, so that the memory allocation pattern could be a repetition of malloc-and-then-free for 1000 times.

■No Support for Delayed Tasks

Libuv does not provide an interface to schedule actions to be run just before the I/O polling method is being called.

Such interface is necessary if you need to aggregate I/O operations and send the results to a single client. In case of HTTP2, a certain number of HTTP requests need to be multiplexed over a single TCP (or TLS) connection.

With lack of support for delayed tasks, you would need to use uv_timer with 0 second timeout to implement delayed tasks.

This works fine if the intention of the application developer is to aggregate read operations, but does not if the intension was to aggregate the result of write operations, since in libuv the write completion callbacks are called after the timers are called.

■Synchronous File I/O is Slow

Synchronous file I/O seems to be slow when compared to directly calling the POSIX API directly.

■Network I/O has Some Overhead

After I found out the issues mentioned above, I started to wonder how much overhead libuv imposes for network I/O.

So I stripped the parts that depended on libuv off from the stream abstraction layer of H2O (that hides the difference between TCP and TLS as described above) and replaced it with direct calls to POSIX APIs and various polling methods (i.e. select, epoll, queue).

The chart below shows the benchmark results taken on my development environment running Ubuntu 14.04 on VMware. H2O become 14% to 29% faster by not depending on libuv.

■The Decision

Considering these facts altogether, I have decided to stop using libuv in H2O, and use the direct binding that I have written and used for the benchmark, at least until the performance penalty vanishes and the TLS support gets integrated.

OTOH I will continue to use libuv for other projects I have as well as recommending it to others, since I still think that it does an excellent job in hiding the messy details of platform-specific asynchronous APIs with adequate performance.

PS. Please do not blame me for not trying to feedback the issues as pull requests to libuv. I might do so in the future, but such task is too heavy for me right now. I am writing this post in the hope that it would help people (including myself) understand more about libuv, and as an answer to @saghul.

Tuesday, July 15, 2014

自社サーバと交信するスマホアプリにおけるサーバ証明書の発行手法について(SSL Pinningと独自CA再考)





■SSL Pinningの是非

■2種類のSSL Pinning



Tuesday, July 1, 2014

The JSON SQL Injection Vulnerability


Many SQL query builders written in Perl do not provide mitigation against JSON SQL injection vulnerability.

Developers should not forget to either type-check the input values taken from JSON (or any other hierarchical data structure) before passing them to the query builders, or should better consider migrating to query builders that provide API immune to such vulnerability.

Note: 問題の発見者による日本語での説明はこちらです.


Traditionally, web applications have been designed to take HTML FORMs as their input. But today, more and more web applications are starting to receive their input using JSON or other kind of hierarchically-structured data containers thanks to the popularization of XMLHttpRequest and smartphone apps.

Designed in the old days, a number of Perl modules including SQL::Maker have been using unblessed references to define SQL expressions. The following example illustrate how the operators are being specified within the users' code. The first query being generated consists of an equality match. The second query is generated through the use of a hashref to specify the operator used for comparison.

use SQL::Maker;
my $maker = SQL::Maker->new(…);

# generates: SELECT * FROM `user` WHERE `name`=?
$maker->select('user', ['*'], {name => $query->param('name')}); 

# generates: SELECT * FROM `fruit` WHERE `price`<=?
$maker->select('fruit', ['*'], {price => {'<=', $query->param('max_price')}});

This approach did not receive any security concern at the time it was invented, when the only source of input were HTML FORMs, since it is represented as a set of key-value pairs where all values are scalars. In other words, it is impossible to inject SQL expressions via HTML FORMs due to the fact that there is a guarantee by the query parser that the right hand expression of foo (i.e. $query->param('foo')) is not a hashref.

JSON SQL Injection

But the story has changed with JSON. JSON objects are represented as hashrefs in Perl, and thus a similar code receiving JSON is vulnerable against SQL operator injection.

Consider the code below.

use SQL::Maker;
my $maker = SQL::Maker->new(…);

# find an user with given name
$maker->select('user', ['*'], {name => $json->{'name'}}); 

The intention of the developer is to execute an SQL query that fetches the user information by using an equality match. If the input is {"name": "John Doe"} the condition of the generated query would be name='John Doe', and a row related to the specified person would be returned.

But what happens if the name field of the JSON was an object? If the supplied input is {"name": {"!=", ""}}, then the query condition becomes name!='' and the database will return all rows with non-empty names. Technically speaking, SQL::Maker accepts any string supplied at the key part as the operator to be used (i.e. there is no whitelisting); so the attack is not limited to changing the operator. (EDIT: Jun 3 2014)

Similar problem exists with the handling of JSON arrays; if the name field of the JSON is an array, then the IN operator would be used instead of the intended = operator.

It should be said that within the code snippet exists an operator injection vulnerability, which is referred hereafter as JSON SQL injection. The possibility of an attacker changing the operator may not seem like an issue of high risk, but there are scenarios in which an unexpected result-set of queries lead to unintended information disclosures or other hazardous behaviors of the application.

To prevent such attack, application developers should either assert that the type of the values are not references (representing arrays/hashes in JSON), or forcibly convert the values to scalars as shown in the snippet below.

use SQL::Maker;
my $maker = SQL::Maker->new(…);

# find an user with given argument that is forcibly converted to string
$maker->select('user', ['*'], {name => $json->{'name'} . ''}); 

Programmers Deserve a Better API

As explained, the programming interface provided by the SQL builders including SQL::Maker is per spec. as such, and thus it is the responsibility of the users to assert correctness of the types of the data being supplied.

But it should also be said that the programming interface is now inadequate in the sense that it is prone to the vulnerability. It would be better if we could use a better, safer way to build SQL queries.

To serve such purpose, we have done two things:

SQL::QueryMaker and the Strict Mode of SQL::Maker

SQL::QueryMaker is a module that we have developed and released just recently. It is not a fully-featured query builder but a module that concentrates in building query conditions. Instead of using unblessed references, the module uses blessed references (i.e. objects) for representing SQL expressions / exports global functions for creating such objects. And such objects are accepted by the most recent versions of SQL::Maker as query conditions.

Besides that, we have also introduced strict mode to SQL::Maker. When operating under strict mode, SQL::Maker will not accept unblessed references as its arguments, so that it would be impossible for attackers to inject SQL operators even if the application developers forgot to type-check the supplied JSON values.

The two together provides a interface immune to JSON SQL injection. The code snippet shown below is an example using the features. Please consult the documentation of the modules for more detail.

use SQL::Maker;
use SQL::QueryMaker;

my $maker = SQL::Maker->new(
   strict => 1,

# generates: SELECT * FROM `user` WHERE `name`=?
$maker->select('user', ['*'], {name => $json->{‘name'}}); 

# generates: SELECT * FROM `fruit` WHERE `price`<=?
$maker->select('fruit', ['*'], {price => sql_le($json->{‘max_price’})}); 

Similar Problem may Exist in Other Languages / Libraries

I would not be surprised if the same proneness exist in other modules of Perl or similar libraries available for other programming languages, since it would seem natural from the programmers' standpoint to change the behaviour of the match condition based on the type of the supplied value.

Generally speaking application developers should not except that a value within JSON is of a certain type. You should always check the type before using them. OTOH we believe that library developers should provide a programming interface immune to vulnerabilities, as we have done in the case of SQL::Maker and SQL::QueryMaker.

Note: the issue was originally reported by Mr. Toshiharu Sugiyama, my colleague working at DeNA Co., Ltd.

Determining whether if a function (or variable, or constant) is declared in C++11

As @mattn_jp pointed out on his blog entry, C/C++ allow use of redundant parenthesis in variable declaration. For example, both of the following statements conform to the language specification.

int i = 3; // declares variable i, initialized to 3
int (j) = 4; // declares variable j, initialized to 4

By using this feature, it is possible to create a macro that tests the existence of whatever variable or constant.

The macro: is_defined in the following example is one such implementation.

Compiling and running the code would likely emit is NAN defined? yes. But if you remove #include <cmath> the output will likely change to false, since NAN is a constant defined in cmath.

#include <iostream>
#include <cmath>

struct is_defined_t {
  static bool defined_;
  is_defined_t() { defined_ = false; }
  template <typename T> explicit is_defined_t(const T&) { defined_ = true; }

bool is_defined_t::defined_;

#define is_defined(arg) (([]() { is_defined_t(arg); return is_defined_t::defined_; })())

int main(int argc, char **argv)
  std::cout << "is NAN defined? " << (is_defined(NAN) ? "yes" : "no") << std::endl;
  return 0;

Shortcoming of the example is that the result can be only obtained at runtime. It would be useful if we could get the result at compile time, since once we have such thing, we can get rid of autoconf, CMake, etc.

Friday, June 27, 2014


LINEが使用している「独自の」暗号化手法について、情報が一部開示(参照:LINEの暗号化について « LINE Engineers' Blog)され、Twitterでもやりとりをしたので、まとめてみる。


TLSではなく、1パスのメッセージ暗号化を使っている理由については、Adopting SPDY in LINE – Part 2: The Details « LINE Engineers' Blogに以下のような記載があり、TLSを使うことによるレイテンシの増加を懸念しているためと考えられる。
3G mobiles networks normally operate at slow speeds. (中略)If the connection changes modes when the user sends a message, we are forced to wait until we receive the server response before we can do anything else.


RSA暗号に加え、ブロックサイズが128bitの共通鍵暗号を用いており、共通鍵暗号の鍵をサーバのRSA公開鍵で暗号化し、共通鍵暗号で暗号化されたメッセージとともに送信していると考えられる(参考:ツイート 1, 2, 3, 4)。

# AES鍵を生成
AESKEY=`openssl rand -hex 16`注6

# 初期化ベクタを生成
IV=`openssl rand -hex 16`

# AES鍵をサーバの公開鍵で暗号化注1し、HTTPリクエストヘッダとして送信
echo -n "$AESKEY" | openssl rsautl -encrypt -pubin -inkey server-public.pem -in /dev/stdin | openssl base64

# POSTリクエストのボディをAESを利用して暗号化し、送信
echo -n "$content" | openssl enc -aes-128-cbc -in /dev/stdin -K "$AESKEY" -iv "$IV" | openssl base64


RSAやAESといった暗号アルゴリズムは正しく使わないと脆弱になるし、どのように使っているかは今回開示されていない。とは言え、このように良く知られた部品(PKCS #1 v1.5とAES-CBC)を単純な形で組み合わせているのであれば、問題がある可能性は低い注2。実際のところ、上記推測例は、秘話化という観点からは僕には特に問題がないように思える。問題があるなら誰か教えてほしい注3



ウェブアプリケーションなどで使用が容易なメッセージ暗号化手法については、現在、JSON Web EncryptionがIETFで標準化の最終段階にあります。完全性保護も含まれているし、標準化団体での安全性検証が行われているこちらを使いましょう。対応しているライブラリについてはJSON Web Token (JWT) - OAuth.jpが参考になるかもしれません。

注1: この例ではPKCS #1 v1.5を使用しているが、RSA-KEMが、より望ましいかもしれない
注2: CBCモードの使用法については、NISTが出しているガイドライン(NIST SP 800-38A)と、CRYPTRECによる安全性評価を参照
注3: リプレイ攻撃への対策や完全性保護については、この例には含めていない
注4: 参照:LINE、改めて傍受を否定 「暗号化後データは独自形式、解読は不能」 - ITmedia ニュース
注5: 参照:韓国国情院がLINE傍受:FACTA online
注6: 生成している鍵とIVの長さが64bitになっていたのを修正 (11:21am)

Thursday, June 5, 2014



Unix系OSにおける権限分離は、伝統的に、利用者ごとに異なるuser idを割り振り、これを用いてアクセス制御を行うという方式で実現されてきた。また、デーモンプロセスについては、不要な権限付与を避け、デーモンプロセス間の相互作用を抑制するために、デーモンごとに専用の「user id」を発番するのが一般的な慣習とされるようになったという経緯がある。


  • (オンラインバンクのパスワードに代表されるような)攻撃価値が高い情報をOS内に保存・利用するのが一般的になった
  • (限定された流通経路で入手するソフトウェアのみならず)インターネットからダウンロードしたプログラムを実行するという慣習が一般的になった

このため、iOSやAndroidのようなスマホOSにおいては、利用者ごとではなくプログラムごとに異なる「user id」を割り振り、互いのデータにアクセスできないような仕組みになっている。

一方でOS Xのような、より以前からあるUnix系OSにおいては、このような仕組みに移行することは現実的でないため、Keychainのように、秘匿性の高い情報についてのみプログラム単位でのアクセス制御機能を提供することで、問題の緩和が計られている。


Friday, May 23, 2014

[メモ] Perlのクロージャ生成速度は遅くない件



$ perl
         Rate  method closure
method  535/s      --    -12%
closure 609/s     14%      --
$ cat
use strict;
use warnings;
use Benchmark qw(cmpthese);

my $COUNT = 1000;

sub doit_closed {
    my $cb = shift;

sub doit_method {
    my $cb = shift;

cmpthese(-1, {
    'closure' => sub {
        for my $i (1..$COUNT) {
            doit_closed(sub {
    'method' => sub {
        for my $i (1..$COUNT) {

package Holder;

sub new {
    my ($class, $value) = @_;
    return bless {
        value => $value
    }, $class;

sub meth {
    my $self = shift;
    return $self->{value};