Skip to content

Batched requests #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Batched requests #53

wants to merge 16 commits into from

Conversation

clostao
Copy link
Member

@clostao clostao commented Mar 14, 2025

No description provided.


// we group the object mapping by the piece index
const PIECE_INDEX_KEY = 1
const nodes = groupBy(objectMappings, PIECE_INDEX_KEY)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work, and once I implement the first stage of autonomys/subspace#3316 , it will re-use pieces most of the time.

But if an object crosses multiple pieces, the last piece will be downloaded twice (once at the end of the batch for the first piece, because that object needs data from both pieces, and again at the start of the batch for the last piece).

If you want to re-use even more pieces, you could batch groups with nearby piece indexes together. That way, you'll re-use the pieces in this situation as well:

  1. There are objects in nearby pieces, and one object crosses multiple pieces
  2. The first piece is re-used for all the objects in the batch in that piece
  3. The object that crosses multiple pieces re-uses the first piece, and downloads the later pieces (up to 5)
  4. The last piece is only downloaded once, and re-used for the next objects in the batch

You can actually combine as many nearby pieces as you like this way. So if you want the most download-efficient code, an alternative algorithm is:

  1. Sort the mappings by piece index
  2. Split the batch when the next object couldn't possibly share a piece with the last one

If the piece is already cached, the response will be almost instant, because it is just moving some data around, and doing one blake3 hash. (Or up to 4 hashes if the object crosses segments - but that's rare, and will only happen once per batch at most.)

Here is how you can work out if two objects could share a piece:

  1. Blocks and objects are limited to 5 MB. So if the difference between the piece indexes in two mappings is greater than 5, they can't share any pieces - and you can split the batch there.
  2. If you know the size of the object, you can calculate the number of pieces in the object using: (object_size + 100) / (2^15 * 31), and rounding up. Then split the batch if the difference between the two piece indexes is greater than the number of pieces in the object.

(The calculation is a bit complicated to account for segment padding and headers, and the unused bytes in the cryptographic scalars we use to generate parity pieces.)

Copy link
Member Author

@clostao clostao Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this change in my last commit. As you can see I've implemented that no re-utilisation is to be tried in the case that objects' pieces are not consecutive.

This is because though objects at the protocol level are limitted to 5MB @autonomys/auto-dag-data limits to 64KB since it's the biggest a Bytes input can take for an extrinsic (that is what is used bySystem.remark) so for two objects to share a piece they have to be consecutive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That seems like an annoying limitation 🙂

Hopefully my next PR will help re-use pieces more, if we're splitting into 64kB objects, then that's a lot of small objects in a single piece for a 1 MB file.

@autonomys autonomys deleted a comment from socket-security bot Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants