-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Implement memory-mapped IO and multi-threading for BLAKE3 hashing #12676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@silvanshade NixOS/nixpkgs#390458 (comment) I left instructions on how to backport things to Nixpkgfs 24.11. Then we can bump the Nixpkgs in the flake (to a newer version of 24.11), and can do things more simply here. |
I've created |
Do you have by chance some examples, I could use to check the performance? |
I added some patches that I needed to get this build with current nixpkgs unstable. However I think we are currently ending up with two tbb versions somehow? At least I get now crashes on macOS during early inintialization (not just in this pull request but also on master with nixpkgs-unstable). |
I just added a benchmarks section to the original post that gives some details on this. Performance is an estimate (since it's using the Rust numbers) but practically identical based on my testing locally and original testing upstream for the Also note that if you are testing on macOS, the difference likely won't be as significant due to the lower relative performance per-core of the NEON implementation versus the AVX implementation. |
Please update to this nixpkgs revision once it's merged and in the channel: NixOS/nixpkgs#393691 Than we can get rid of our overrides. |
9bad31c
to
60b15a6
Compare
@Ericson2314 @Mic92 I've updated the |
Reviving this PR now that I think most of the We want to rebase one more time with updated nixpkgs once the very latest 24.11 backported changes (with the MinGW and FreeBSD and i686 fixes) hit the release channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy all those TBB build system changes are indeed no longer needed!
I updated the I believe this resolves all the known issues with failing builds and this PR should be ready for final review/merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
This PR implements memory-mapped IO and multi-threading for BLAKE3 hashing.
Performance with these changes is now on par with the proposed Rust interop: #12416.
Benchmarks
NOTE: The non-BLAKE3 results may be faster by a small margin with this PR than what is stated in the benchmarks since they will also make use of the memory-mapping changes.
Config
CPU: AMD Ryzen 9 7950X 16-Core @ 5.88 GHz
RAM: 96GB @ 6400 MT/s
OS: CachyOS February 2025 release w/ bpfland scx
Input files created with:
example:
Benchmarks all used the following:
100K file
BLAKE3 (original)
BLAKE3 (memory-mapping + tbb)
SHA256
SHA512
10M file
BLAKE3 (original)
BLAKE3 (memory-mapping + tbb)
SHA256
SHA512
100M file
BLAKE3 (original)
BLAKE3 (memory-mapping + tbb)
SHA256
SHA512
300M file
BLAKE3 (original)
BLAKE3 (memory-mapping + tbb)
SHA256
SHA512
1G file
BLAKE3 (original)
BLAKE3 (memory-mapping + tbb)
SHA256
SHA512
20G file
BLAKE3 (original)
BLAKE3 (memory-mapping + tbb)
SHA256
SHA512
64G file
BLAKE3 (original)
BLAKE3 (memory-mapping + tbb)
SHA256
SHA512
Motivation
This PR adds additional functionality to the existing BLAKE3 implementation in
nix
to bring the performance on par withb3sum
.The performance difference between the two is due
b3sum
making use of the Rust BLAKE3 implementation which uses both memory-mapped IO and multi-threading.Until recently, multi-threading was not available for the C-based
libblake3
but is now supported in release1.7.0
.Context
This PR is a follow up to #12379 (comment).
Related: NixOS/nixpkgs#390458
Design Considerations
This PR implements memory-mapped IO via
boost::iostreams::mapped_file
, which adds a boost component dependency foriostreams
.Enabling multi-threading for
libblake3
also adds a dependency ontbb
of at least version2021_11
.Memory-mapping is performed in:
and a new optional parameter
memory_map
is used to control whether memory-mapping is skipped in favor of normal file reading. (If memory-mapping fails, normal file reading is also used as the fallback).This makes memory-mapping the default, which likely has performance implications beyond hashing. I would expect this to often be more performant than the alternative given available resources and modern hardware but haven't tested beyond hashing.
It may be appropriate to only enable memory-mapping when explicitly requested and/or gate memory-mapping behind an experimental feature. I can make those changes if requested.
@Ericson2314 @edolstra