-
Notifications
You must be signed in to change notification settings - Fork 0
Batched requests #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Batched requests #53
Conversation
|
||
// we group the object mapping by the piece index | ||
const PIECE_INDEX_KEY = 1 | ||
const nodes = groupBy(objectMappings, PIECE_INDEX_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work, and once I implement the first stage of autonomys/subspace#3316 , it will re-use pieces most of the time.
But if an object crosses multiple pieces, the last piece will be downloaded twice (once at the end of the batch for the first piece, because that object needs data from both pieces, and again at the start of the batch for the last piece).
If you want to re-use even more pieces, you could batch groups with nearby piece indexes together. That way, you'll re-use the pieces in this situation as well:
- There are objects in nearby pieces, and one object crosses multiple pieces
- The first piece is re-used for all the objects in the batch in that piece
- The object that crosses multiple pieces re-uses the first piece, and downloads the later pieces (up to 5)
- The last piece is only downloaded once, and re-used for the next objects in the batch
You can actually combine as many nearby pieces as you like this way. So if you want the most download-efficient code, an alternative algorithm is:
- Sort the mappings by piece index
- Split the batch when the next object couldn't possibly share a piece with the last one
If the piece is already cached, the response will be almost instant, because it is just moving some data around, and doing one blake3 hash. (Or up to 4 hashes if the object crosses segments - but that's rare, and will only happen once per batch at most.)
Here is how you can work out if two objects could share a piece:
- Blocks and objects are limited to 5 MB. So if the difference between the piece indexes in two mappings is greater than 5, they can't share any pieces - and you can split the batch there.
- If you know the size of the object, you can calculate the number of pieces in the object using:
(object_size + 100) / (2^15 * 31)
, and rounding up. Then split the batch if the difference between the two piece indexes is greater than the number of pieces in the object.
(The calculation is a bit complicated to account for segment padding and headers, and the unused bytes in the cryptographic scalars we use to generate parity pieces.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this change in my last commit. As you can see I've implemented that no re-utilisation is to be tried in the case that objects' pieces are not consecutive.
This is because though objects at the protocol level are limitted to 5MB @autonomys/auto-dag-data
limits to 64KB since it's the biggest a Bytes
input can take for an extrinsic (that is what is used bySystem.remark
) so for two objects to share a piece they have to be consecutive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That seems like an annoying limitation 🙂
Hopefully my next PR will help re-use pieces more, if we're splitting into 64kB objects, then that's a lot of small objects in a single piece for a 1 MB file.
…into batched-requests
… for file retrieval service
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
No description provided.