You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The lazy image puller that pulls & extracts layers on first access https://github.com/moby/buildkit/blob/v0.20.1/cache/refs.go#L1270-L1311 works with errgroup where pulling blob and unlazying parent are done in parallel, and once both complete, unpack on the blob can run. This allows unpack to start running while some blobs are still being pulled.
When looking at the OTEL traces, it seems that (because of the errgroup contention) the HTTP requests do not go out in deterministic order and never in the actual order of the layers that would be optimal for performance. Because the parallelization of requests is limited per registry, the download attempts for the first layer wait in queue atm while the later layers are being downloaded (assuming image has more layers than parallelization limit).
It seems that in practical cases the order looks more closer to reverse order of the image layers. At least when testing with python image, then the HTTP request for the first layer was always either the last or penultimate to go out. I think even if it is hard to guarantee the most optimal order every time, then this reverse-like order should be avoided.
Not entirely clear what would be the best way to improve this. Adding more sequential logic to pulling blobs reduces the performance of the download phase. Some of it is also completely out of our control, eg. most registries do multiple requests internally (in hub first request goes to docker.io, that usually redirects to cloudflare).
The lazy image puller that pulls & extracts layers on first access https://github.com/moby/buildkit/blob/v0.20.1/cache/refs.go#L1270-L1311 works with errgroup where pulling blob and unlazying parent are done in parallel, and once both complete, unpack on the blob can run. This allows unpack to start running while some blobs are still being pulled.
When looking at the OTEL traces, it seems that (because of the errgroup contention) the HTTP requests do not go out in deterministic order and never in the actual order of the layers that would be optimal for performance. Because the parallelization of requests is limited per registry, the download attempts for the first layer wait in queue atm while the later layers are being downloaded (assuming image has more layers than parallelization limit).
It seems that in practical cases the order looks more closer to reverse order of the image layers. At least when testing with
python
image, then the HTTP request for the first layer was always either the last or penultimate to go out. I think even if it is hard to guarantee the most optimal order every time, then this reverse-like order should be avoided.Not entirely clear what would be the best way to improve this. Adding more sequential logic to pulling blobs reduces the performance of the download phase. Some of it is also completely out of our control, eg. most registries do multiple requests internally (in hub first request goes to docker.io, that usually redirects to cloudflare).
@sipsma
The text was updated successfully, but these errors were encountered: