Among the new features you will be finding in 2.2, in this blogpost I would like to talk about our support for JSON logging.
Traditionally, the log file format of HTTP servers have followed the tradition set by NCSA httpd more than twenty years ago. But the more we try to deal in various ways with the logs, the more it makes sense to use a standardized and extensible format so that we can apply existing tools to the logs being collected. Hence JSON.
Our support for JSON is a smooth evolution from the NCSA- (and Apache-) style logging. Configuration for a JSON logging will look like below.
access-log: path: /path/to/access-log.json format: '{"remote": "%h:%{remote}p", "at": "%{%Y%m%d%H%M%S}t.%{msec_frac}t", "method": "%m", "path": "%U%q", "status": %s, "body-size": %b, "referer": "%{referer}i"}' escape: jsonThe template specified by the
format
attribute uses the exact same specifiers as we use in NCSA-style logging. The only differences are that the non-substituted part of the template is JSON, and that another attributed named escape
is set to json
. The attribute instructs the logger to emit things in a JSON-compatible manner.Specifically, the behavior of the logger is changed to be:
- strings are escaped in JSON style (i.e.
\u00nn
) instead of\xnn
- nulls are emitted as
null
instead of-
The format may seem a bit verbose, but gives you the power to name the elements of a JSON object as you like, and to choose whatever format you want to use for compound values (e.g. the date, as shown in the example above).
When accessed by a client, a log line like below will be emitted for the above configuration.
{"remote": "192.0.2.1:54389", "at": "20170322161623.023495", "method": "GET", "path": "/index.html", "status": 200, "body-size": 239, "referer": null}One thing you may notice is that the value of the
referer
element is emitted as null
without the surrounding double quotes that existed in the specified format. When escaping in JSON style, h2o removes the surrounding quotes if the sole value of the string literal is a single format specifier (i.e. %...
) and if the format specifier evaluates to null. In other words, "%foo"
evaluates to either a string literal or null, while %foo
evaluates to a number or null.If a string literal contains something more than just one format specifier, then the values are concatenated as strings to form a string literal. So
"abc%foo"
will evalutate to "abcnull"
.The other thing that is worth noting is that the substituted values will always be escaped as ISO-8859-1. It is the responsibility of the user to convert the string literals found in the log to the correct character encoding. Such conversion cannot be done at HTTP server level since it requires the knowledge of the application being run. I would like to thank @nalsh for suggesting the approach.