Kazuho's Weblog: Optimizing performance of multi-tier web applications using HTTP/2 push

Thursday, December 3, 2015

Optimizing performance of multi-tier web applications using HTTP/2 push

Push is a feature of HTTP/2, that allows a server to speculatively send response to a client, anticipating that the client will use the response.

In my earlier blogpost, I wrote that HTTP/2 push does not have significant effect on web performance when serving static files from a single HTTP/2 server. While that is true, push does improve performance by noticeable margin in other scenarios. Let's look into one common case.

The Theory

Many if not most of today's web applications are multi-tiered. Typically, an HTTP request from a client is first accepted by an httpd (either operated by the provider of the web service or by a CDN operator). The httpd serves asset files by itself, while routing requests for HTML documents to application server through fastcgi or HTTP/1.

It is when the response from the application server takes time that HTTP/2 push gives us a big performance boost.

The chart below should be a clear explanation why. With HTTP/2 push, it has become possible for a server to start sending assets that are going to be referred from HTML, before the generated HTML is returned from the application running behind.

Figure 1. Timing sequence of a multi-tiered webapp
(RTT: 50ms, processing-time: 200ms)

It is not uncommon for an web application to spend hundreds of milliseconds processing an HTTP request, querying and updating the database. It is also common for a CDN edge server to wait for hundreds of milliseconds fetching a HTTP response from an web application server through an inter-continental connection.

In case of the chart, RTT between httpd and client is 50ms and the processing time is 200ms. Therefore, the server is capable of spending 4 round-trips (or typically slightly above 200KB of bandwidth¹) for pushing asset files before HTML becomes ready to be served.

And thanks to push transactions, the connection will be warm enough by the time when HTML becomes available to the web server, so that the chance of the server being able to send whole document at once becomes bigger.

Theoretically, the upper bound of time reducible by the proposed approach (i.e. push assets until the main document becomes ready) is:

time_reduced_max = processing_time + 1 RTT

The additional 1 RTT appears if HTML being delivered is small that it is incapable of growing the send window in the pull case. time_reduced_min is obviously zero, when no resource that can be pushed exists.

Cache-aware Server Push

Even in case you have a time window that can be used to push few hundred kilobytes of data, you would definitely not want to waste the bandwidth by pushing responses already cached by the client.

That is why cache-aware server-pusher (CASPER) becomes important.

Initially implemented as an experimental feature in H2O HTTP2 server version 1.5, CASPER tracks the cache state of the web browser using a single Cookie². The cookie contains a fingerprint of all the high-priority asset files being cached by the browser compressed using Golomb-compressed sets. H2O updates the fingerprint every time it serves a high-priority asset file, as well as for determining whether certain asset files should be pushed or not.

It should be noted that the current fingerprint maintained by the cookie is at best a poor estimate of what is being cached by the client. Without a way to peek into the web browser cache, we cannot update the fingerprint stored in the cookie to reflect evictions from the cache². Ideally, web browsers should calculate the fingerprint by itself and send the value to the server. But until then, we have to live with using cookies (or a ServiceWorker-based implementation that would give us freedom in implementing our own cache³) as a hacky workaround.

Benchmark

Let's move on to an experiment to verify if the theory can be applied in practice.

For the purpose, I am using the top page of h2o.examp1e.net. The server (H2O version 1.6.0-beta2 with CASPER enabled; see configuration) is given 50ms simulated latency using tc qdisc, and a web application that returns index.html with 200ms latency is placed behind the server. Google Chrome 46 is used as the test client.

FWIW, the size of the responses being served are as follows:

Figure 2. Size of the Files by Type
File type	Size
index.html	3,619 bytes
blocking assets	319,700 bytes (5 files)
non-blocking assets	415,935 bytes (2 files)

Blocking assets are CSS and JavaScript files that block the critical rendering path (i.e. the files that need to be obtained by the browser before rendering the webpage). Non-blocking assets are asset files that do not block the critical path (e.g. images).

Next two figures are the charts shown by the Chrome's Developer Tools. In the former, none of the responses were pushed. In the latter, blocking assets were pushed using CASPER.

Figure 3. Chrome Timing Chart without Push

Figure 4. Chrome Timing Chart with Push⁴

As can be seen, both DOMContentLoaded and load events are observed around 230 msec earlier when push is being used; which matches the expectation that we would see an improvement of 200 msec to 250 msec.

Figure 5. Timing Improvements with Push
Event	Without Push (msec)	With Push (msec)	Delta (msec)	Gain
DOMContentLoaded	823	595	228	38%
load	1,010	775	235	30%

Conclusion

As shown in this blogpost, cache-aware server push can be used by a reverse proxy to push assets while waiting for the backend application server to provide dynamically generated content, effectively hiding the processing time of the application server. Or in case of CDN, it can be used to hide the latency between the edge server and the application server.

Considering how common it is the case that the processing time of an web application (or the RTT between an edge server and an application server) is greater than the RTT between the client and the reverse proxy (or the edge server in case of CDN), we can expect cache-aware server push to provide noticeable improvement to web performance in many deployments.

1: in common case where INITCWND is 10 and MSS is around 1,400 bytes, it is possible to send 150 packets in 4 RTT, reaching 210KB in total
2: fortunately, existence of false-positives in the fingerprint is not a big issue, since the client can simply revert to using ordinary GET request in case push is not used
3: ongoing work is explained in HTTP/2 Push を Service Worker + Cache Aware Server Push で効率化したい話 - Block Rockin' Codes
4: Chromes' timing chart shows pushed streams as being fetched when they are actually being adopted after received

EDIT: This blogpost is written as part of the http2 Advent Calendar 2015 (mostly in Japanese).