Monday, December 15, 2014

PicoHTTPParser now has a chunked-encoding decoder

Today I have added phr_decode_chunked - a function for decoding chunked-encoded input - to picohttpparser.

As suggested in the doc-comment of the function (shown below), the function is designed to decode the data in-place. In other words, it is not copy-less.
/* the function rewrites the buffer given as (buf, bufsz) removing the chunked-
 * encoding headers. When the function returns without an error, bufsz is
 * updated to the length of the decoded data available. Applications should
 * repeatedly call the function while it returns -2 (incomplete) every time
 * supplying newly arrived data. If the end of the chunked-encoded data is
 * found, the function returns a non-negative number indicating the number of
 * octets left undecoded at the tail of the supplied buffer. Returns -1 on
 * error.
 */
ssize_t phr_decode_chunked(struct phr_chunked_decoder *decoder, char *buf,
                           size_t *bufsz);
It is intentionally designed as such.

Consider a input like the following. The example is more than 2MB long even though it contains only 2 bytes of data. The input is conformant to the HTTP/1.1 specification since it does not define the maximum length of the chunked extensions, requires every conforming implementation to ignore unknown extensions.
1 very-very-veery long extension that lasts ...(snip) 1MB
a
1 very-very-veery long extension that lasts ...(snip) 1MB
a
To handle such input without getting the memory exhausted, a decoder should either a) only preserve the decoded data (requires a copy), or b) limit the size of the chunked-encoded data.

B might have been easier to implement, but such a feature might be difficult to administer. So I decided to take the route a, and for simplicity implemented the decoder to always adjust the position of the data in-place.

Always calling memmove for adjusting the position might induce some overhead, but I assume it to be negligible for two reasons: both the source and destination would exist in the CPU cache / the overhead of unaligned memory access is small on recent Intel CPU.

For ease-of-use, I have added examples to the README.

11 comments:

  1. I read your blog.I thought it was great.. Hope you have a great day. God bless.

    Rica
    www.imarksweb.org

    ReplyDelete
  2. I like this a lot. Thank you for sharing. I'm always looking for upcycles like this. In the end, you don't know it was a shipping pallet to begin with!
    ICC T20 World Cup 2016 Venues & Starting

    Date

    Australia vs India Prediction

    ReplyDelete

  3. شركة نقل عفش بالرياض وجدة والدمام والخبر والجبيل اولقطيف والاحساء والرياض وجدة ومكة المدينة المنورة والخرج والطائف وخميس مشيط وبجدة افضل شركة نقل عفش بجدة نعرضها مجموعة الفا لنقل العفش بمكة والخرج والقصيم والطائف وتبوك وخميس مشيط ونجران وجيزان وبريدة والمدينة المنورة وينبع افضل شركات نقل الاثاث بالجبيل والطائف وخميس مشيط وبريدة وعنيزو وابها ونجران المدينة وينبع تبوك والقصيم الخرج حفر الباطن والظهران
    شركة نقل عفش بجدة
    شركة نقل عفش بالمدينة المنورة
    شركة نقل اثاث بالرياض
    شركة نقل عفش بالدمام
    شركة نقل عفش بالطائف

    ReplyDelete