The library now uses the PCMPESTRI instruction which is part of SSE 4.2, running 68% to 90% faster.
|Benchmark||clang -O3||clang -O3 -msse4.2||Improvement|
PCMPxSTRx is a SIMD instruction that can be used for parsing text. In
_SIDD_CMP_RANGESmode, it checks at most 16 bytes at once, if each byte is within given set of ranges. Herumi and I have created a wrapper function for the instruction named findchar_fast that iterates though every 16 bytes of a given buffer to find the first occurrence of a byte within a set of given ranges.
And the function is merged neatly into the parser; the code below at first uses the SIMD function to look for a control character (the match condition is defined as
ranges1), then falls back to the non-SIMD code to handling up to 15 remaining characters (that cannot be processed by the SIMD function due to out-of-bounds access).
#ifdef __SSE4_2__ static const char ranges1 = "\0\010" /* allow HT */ "\012\037" /* allow SP and up to but not including DEL */ "\177\177" /* allow chars w. MSB set */ ; int found; buf = findchar_fast(buf, buf_end, ranges1, sizeof(ranges1) - 1, &found); if (found) goto FOUND_CTL; #endif /* code that handles the input byte-by-byte */To summarize,
PCMPxSTRxis an excellent instruction for performance that can be cleanly integrated into existing parsers (tokenizers). Hopefully we will see performance improvements in various parsers in the future through the use of the instruction.
This blog post has been written as part of the H2O Advent Calendar.