Skip to content

[BUG] BlobClient.DownloadTo does not download data using parallel requests #49241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
martinknafvework opened this issue Apr 4, 2025 · 2 comments
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@martinknafvework
Copy link

martinknafvework commented Apr 4, 2025

Library name and version

Azure.Storage.Blobs 12.24.0

Describe the bug

According to the documentation, DownloadTo(Stream) should download a blob using parallel requests. This does not work. Instead of downloading parts of the blob in parallel, it downloads the parts one at a time sequentially.

If I use the DownloadToAsync function instead, parallel downloads are performed.

I am not familiar with the Azure.Storage.Blobs code-base, but it looks like DownloadTo calls StagedDownloadAsync, passing in async = false. The value is passed on to PartitionedDownloader.DownloadToInternal which contains the following line:

int effectiveWorkerCount = async ? _maxWorkerCount : 1;

effectiveWorkerCount is later on used to control the number of parallel tasks.

Expected behavior

When downloading a blob which is 350 MB large, I expect multiple HTTP-requests to be sent in parallel (since the documentation mentions that the function "downloads a blob using parallel requests").

Actual behavior

The following happens:

  1. A HTTP request is performed to download the first 255MB.
    (255MB appears to be the default value for StorageTransferOptions.InitialTransferSize)
  2. A HTTP request is performed to download the next ~8MB.
  3. A HTTP request is performed to download the next ~8MB.
  4. The download continuous until completion - one ~8MB chunk at a time

These downloads happens sequentially and not in parallel.

Reproduction Steps

  1. Upload a blob which is 350MB large to Azure Blob Storage.
  2. Start Fiddler or some other software to monitor HTTP-reqeusts.
  3. Run the below code to download the blob.
 var blobContainerClient = new BlobContainerClient(
    "<my-connection-string>",
    "test-container");
 var blobClient = blobContainerClient.GetBlobClient("my-large-blob.txt");

 using (var memoryStream = new MemoryStream())
 {
    blobClient.DownloadTo(memoryStream);
 }

According to the documentation, data should be downloaded in parallel but that does not happen. The data in the blob is downloaded sequentially.

Even if I explicitly set concurrency, download does not happen in parallel.

 using (var memoryStream = new MemoryStream())
 {
    blobClient.DownloadTo(memoryStream, new BlobDownloadToOptions()
    {
       TransferOptions = new StorageTransferOptions()
       {
          InitialTransferSize = 1_000_000,
          MaximumConcurrency = 50,
          MaximumTransferSize = 1_000_000
       }
    });
 }

If I change the code to the following, downloads will happen in parallel.

using (var memoryStream = new MemoryStream())
{
   await blobClient.DownloadToAsync(memoryStream);
}

Environment

Windows 11, Visual Studio 2022, .NET Framework 4.8

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files) labels Apr 4, 2025
Copy link

github-actions bot commented Apr 4, 2025

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @amnguye @jalauzon-msft @jaschrep-msft @nickliu-msft @seanmcc-msft.

Copy link

github-actions bot commented Apr 4, 2025

Hello @martinknafvework. I'm an AI assistant for the azure-sdk-for-net repository. I have some suggestions that you can try out while the team gets back to you.

The team will get back to you shortly, hopefully this helps in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

1 participant