Thursday, October 22, 2015

Performance improvements with HTTP/2 push and server-driven prioritization


HTTP/2 push only marginally improves web-site performance (even when it does). But it might provide better user experience over mobile networks with TCP middleboxes.


Push is an interesting feature of HTTP/2.

By using push, HTTP servers can start sending certain asset files that block rendering (e.g. CSS and script files) before the web browser issues requests for such assets. I have heard (or spoken myself of) anticipations that by doing so we might be able to cut down the response time of the Web.

CASPER (cache-aware server pusher)

The biggest barrier in using HTTP/2 push has so far been considered cache validation.

For the server to start pushing asset files, it needs to be sure that the client does not already have the asset cached. You would never want to push a asset file that is already been cached by the client - doing so not only will waste the bandwidth but also cause negative effect on response time. But how can a server determine the cache state of the client without spending an RTT asking to the client?

That's were CASPER comes in.

CASPER (abbreviation for cache-aware server pusher) is a function introduced in H2O version 1.5, that tracks the cache state of the web browser using a single Cookie. The cookie contains a fingerprint of all the high-priority asset files being cached by the browser compressed using Golomb-compressed sets.

The cookie is associated with every request sent by the client1. So when the server receives a request to HTML, it can immediately determine whether or not the browser is in possession of the blocking assets (CSS and script files) required to render the requested HTML. And it can push the only files that are known not to be cached.

With CASPER, it has now become practical to use HTTP/2 push for serving asset files.

Using HTTP/2 push in H2O

This week, I have started using CASPER on - the tiny official site of H2O.

The configuration looks like below. The mruby handler initiates push of JavaScript and CSS files if the request is against the top page or one of the HTML documents, and then (by using 399 status code) delegates the request to the next handler (defined by file.dir directive) that actually returns a static file. http2-casper directive is used to turn CASPER on, so that the server will discard push attempts initiated by the mruby handler for assets that are likely cached. http2-reprioritize-blocking-assets is a performance tuning option that raises the priority of blocking assets to highest for web browsers that do not.
        mruby.handler: |
          lambda do |env|
            push_paths = []
            if /(\/|\.html)$/.match(env["PATH_INFO"])
              push_paths << "/search/jquery-1.9.1.min.js"
              push_paths << "/search/oktavia-jquery-ui.js"
              push_paths << "/search/oktavia-english-search.js"
              push_paths << "/assets/style.css"
              push_paths << "/assets/searchstyle.css"
            return [
              push_paths.empty? ?
                {} :
                {"link" =>{|p| "<#{p}>; rel=preload"}.join("\n")},
        file.dir: /path/to/doc-root
    http2-casper: ON
    http2-reprioritize-blocking-assets: ON

The benchmark

With the setting above (with http2-reprioritize-blocking-assets both OFF and ON), I have measured unload, first-paint, and load timings2 using Google Chrome 46, and the numbers are as follows. The RTT between the server and the client was ~25 milliseconds. The results were mostly reproducible between multiple attempts.

First, let's look at the first two rows that have push turned off. It is evident that reprio:on3 starts rendering the response 50 milliseconds earlier (2 RTT). This is because unless the option is turned on, the priority tree created by Chrome instructs the web browser to interleave responses containing CSS / JavaScript with those containing image files.

Next, let's compare the first two rows (push:off) with the latter two (push:off). It is interesting that unload timings have moved towards right. This is because when push is turned on, the contents of the asset files are sent before the contents of the HMTL. Since web browsers unload the previous page when it receives the first octets of the HTML file, using HTTP/2 push actually increases the time spent until the previous page is unloaded.

The fact will have both positive and negative effects to user experience; the positive side is that time user sees a blank screen decreases substantially (the red section - time spent after unload before first-paint). The negative side is that users would need to wait longer until he/she knows that the server has responded (by the browser starting to render the next page).

It is not surprising that turning on push only somewhat improves the first-paint timing compared to both being turned off; the server is capable of sending more CSS and JavaScript before it receives request for image files, and start interleaving the responses with them.

On the other hand, it might be surprising that using push together with reprioritization did not cause any differences. The reason is simple; in this scenario, transferring the necessary assets and the <head> section of the HTML (in total about 320KB) required about 10 round trips (including overhead required by TCP, TLS, HTTP/2). With this much roundtrips, the merit of push can hardly be observed; considering the fact that push is technique to eliminate one round trip necessary for the browser to issue requests for the blocking assets4.


The benchmark reinforces the claims made by some that HTTP2 push will only have marginal effect on web performance5. The results have been consistent with expectations that using push will only optimize the web performance by 1 RTT at maximum, and it would be hard to observe the difference considering the effect of TCP slow start and how many roundtrips are usually required to render a web page.

This means to the users of H2O (with reprioritization turned on by default) that they can expect near-best performance without using push.

On the other hand, we may still need to look at networks having TCP proxies. As discussed in Why TCP optimisation has become more important than content optimization ( some mobile carriers do seem to have such middlebox installed.

Existence of such device is generally a good thing since it not only reduces packet retransmits but also improves TCP bandwidth during the slow-start phase. But the downside is that their existence usually increase application-level RTT, since they expand the amount of data in-flight, which has a negative impact on the responsiveness of HTTP/2. HTTP/2 push will be a good optimization under such network conditions.

1. the fingerprint contained in the Cookie header is efficiently compressed by HPACK
2. wpbench was used to collect the numbers; first-paint was calculated as max(head-parsed, css-loaded); in this benchmark, DOMContentLoaded timing was indifferent to first-paint
3. starting from H2O version 1.5, http2-reprioritize-blocking-assets option is turned on by default
4. at 10 RTT it is unlikely that we have hit the maximum network bandwidth, and that means that packets will be received by the browser in batch every RTT
5. there are use cases for HTTP/2 push other than pushing asset files

Thursday, October 8, 2015


みたいなツッコミをもらって、うっすみません…ってなってRuby VMのコードを読むことになったわけです。


1. オブジェクト生成のホットパスの最適化



* gc.c (newobj_of): divide fast path and slow path


optimize performance of `rb_str_resurrect` by kazuho · Pull Request #1050 · ruby/ruby


2. ヒープページのソートをやめる




3. スイープの最適化


optimize gc sweep by kazuho · Pull Request #1049 · ruby/ruby

この3つを組み合わせると、rdocみたいな実アプリケーションの実行時間が、手元で5%以上縮みそう!注1 ってことで満足したのがここ二日間の進捗です!!!!!!! なんかいろいろ滞っているような気がしますがすみああおえtぬさおえうh


注1: バグがなければ!!

Tuesday, October 6, 2015

[メモ] OS XのホストからVMにnfsでファイル共有

普段OS X上で作業しつつ、開発ディレクトリをOS X上のVMで動いているLinuxやFreeBSDからもアクセスできるようにしてあると、互換性検証がはかどる。


1. ゲスト側で通常使用するアカウントのuser-id,group-idを、OS Xのそれに揃える

2. OS Xの/etc/exportsと/etc/nfs.confを以下のように、TCP経由でVMの仮想ネットワークにだけファイルを公開するよう設定
/shared-dir -mapall=user:group -network netaddr -mask netmask

3. ゲスト側の/etc/fstabにマウント情報を設定 /shared-dir nfs rw,noatime 1 0

注. VMware Fusion 10の場合は、環境設定から、新しいネットワークを作成し、「NATを使用する」を外し、そのネットワークを使う必要がある。そうしない限り、ホストに見えるTCPのソースアドレスがローカルネットワークのものにならない。


Thursday, October 1, 2015

ウェブページの描画 (first-paint) までの時間を測定するツールを作った件、もしくはHTTP2時代のパフォーマンスチューニングの話




イベント 意味
unload 現在のページからの離脱。離脱後、first-paintまでは真っ白な表示になります
first-paint ウェブページの初回描画(HTMLの後半や画像は存在しない可能性があります)
DOMContentLoaded ウェブページのレイアウト完了
load (onload) 画像等を含む全データの表示完了

  • ユーザができるだけ早くウェブページを閲覧し始めることができるよう、first-paintの値を小さくすることを第一の目標注1
  • 全データができるだけ早く揃うよう、loadの値を小さくすることを第二の目標

ですが、残念なことに、first-paintまでの時間をAPIを用いて取得できるウェブブラウザは一部に限られています(参照:「Webページ遷移時間のパフォーマンス「First Paint」を計測する方法」)。また、測定にあたって、運用中のウェブページに変更を加えたくない、ということもあったりします。






注1: ブラウザによってはfirst-paintのタイミングを取得できないため、DOMContentLoadedを使うこともありますが、DOMContentLoadedには<body>末尾に配置する統計系のスクリプトの読み込みにかかる時間等が加算される点等、注意が必要になります。また、画像なしに閲覧が不可能なサイトについては、Above the foldの値をチューニングする必要があります