You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We rely heavily on arrow-json, and json decoding is often a performance-sensitive part of streaming pipelines. After doing some benchmarking, I was surprised to see that arrow-json was significantly slower than Jackson (a popular Java json library). After spending some time profiling, I found some easy wins in the TapeDecoder. Together, they amount to ~X% improvement for a diverse set of json documents, according to the benchmarks here.
Here's an example profile:
A plurality of the time is spent in BufIter::advance_until
But BufIter is wrapping an Iterator, which makes several of the operations pretty slow (in particular, advance() has to call next() in a loop). Re-implementing BufIter directly with a buffer and an offset allows us to more efficiently implement all of these operations, for an average 22% improvement.
We can also improve one the usages of advance_until, which finds the end of a string and is quite expensive for long strings as currently implemented:
let s = iter.advance_until(|b| matches!(b,b'\\' | b'"'));
By using memchr, a simd-optimized char finding library, we can get an average 16% improvement.
Another big cost for string-heavy documents is utf8 validation, and we can get some quick wins there by using simdutf8 (which has already been discussed in other contexts, in #7014). This is good for about 5%.
Altogether, these changes improve performance in my benchmarks from 25-39%, averaging 32%.
There are also some other opportunities to improve. There are more operations that could be vectorized (in particular, skipping whitespace). Another major cost is pushing strings and numbers into the buffer one by one. It's much faster to copy the entire input into the buffer at the start, although it costs extra memory to store whitespace and other tokens.
The text was updated successfully, but these errors were encountered:
We rely heavily on arrow-json, and json decoding is often a performance-sensitive part of streaming pipelines. After doing some benchmarking, I was surprised to see that arrow-json was significantly slower than Jackson (a popular Java json library). After spending some time profiling, I found some easy wins in the TapeDecoder. Together, they amount to ~X% improvement for a diverse set of json documents, according to the benchmarks here.
Here's an example profile:
A plurality of the time is spent in BufIter::advance_until
arrow-rs/arrow-json/src/reader/tape.rs
Lines 643 to 655 in f6ac87e
But BufIter is wrapping an Iterator, which makes several of the operations pretty slow (in particular, advance() has to call next() in a loop). Re-implementing BufIter directly with a buffer and an offset allows us to more efficiently implement all of these operations, for an average 22% improvement.
We can also improve one the usages of advance_until, which finds the end of a string and is quite expensive for long strings as currently implemented:
arrow-rs/arrow-json/src/reader/tape.rs
Line 397 in f6ac87e
By using memchr, a simd-optimized char finding library, we can get an average 16% improvement.
Another big cost for string-heavy documents is utf8 validation, and we can get some quick wins there by using simdutf8 (which has already been discussed in other contexts, in #7014). This is good for about 5%.
Altogether, these changes improve performance in my benchmarks from 25-39%, averaging 32%.
There are also some other opportunities to improve. There are more operations that could be vectorized (in particular, skipping whitespace). Another major cost is pushing strings and numbers into the buffer one by one. It's much faster to copy the entire input into the buffer at the start, although it costs extra memory to store whitespace and other tokens.
The text was updated successfully, but these errors were encountered: