Skip to content

Add content length to GCP multipart complete #7301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jkosh44
Copy link

@jkosh44 jkosh44 commented Mar 17, 2025

Which issue does this PR close?

Rationale for this change

This commit fixes the GCP mulipart complete implementation by adding the Content-Length header to the XML request. According to the docs, https://cloud.google.com/storage/docs/xml-api/post-object-complete, this header is required but wasn't being set previously.

What changes are included in this PR?

Adding Content-Length header to GCP multipart complete XML request.

Are there any user-facing changes?

No

@github-actions github-actions bot added the object-store Object Store Interface label Mar 17, 2025
@jkosh44
Copy link
Author

jkosh44 commented Mar 17, 2025

It's very possible that I'm just holding this wrong, because I'm surprised that no one else has complained yet, however I'm seeing errors when using this library with GCP for multipart uploads of the format: error: Client error with status 411 Length Required.

@jkosh44 jkosh44 force-pushed the gcp-multipart-complete branch from 8c907b7 to 3c3cf1d Compare March 17, 2025 17:38
@jkosh44 jkosh44 marked this pull request as ready for review March 17, 2025 17:38
@jkosh44
Copy link
Author

jkosh44 commented Mar 17, 2025

I haven't validated this fix yet, if there's some way to test this via some integration test that would be helpful to know. Otherwise I can try running this against a real GCP bucket, but it will probably take some time to get set up.

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

I haven't validated this fix yet, if there's some way to test this via some integration test that would be helpful to know. Otherwise I can try running this against a real GCP bucket, but it will probably take some time to get set up.

This is what we use for GCP integration

# We are forced to use docker commands instead of service containers as we need to override the entrypoints
# which is currently not supported - https://github.com/actions/runner/discussions/1872
- name: Configure Fake GCS Server (GCP emulation)
# Custom image - see fsouza/fake-gcs-server#1164
run: |
echo "GCS_CONTAINER=$(docker run -d -p 4443:4443 tustvold/fake-gcs-server -scheme http -backend memory -public-host localhost:4443)" >> $GITHUB_ENV
# Give the container a moment to start up prior to configuring it
sleep 1
curl -v -X POST --data-binary '{"name":"test-bucket"}' -H "Content-Type: application/json" "http://localhost:4443/storage/v1/b"
echo '{"gcs_base_url": "http://localhost:4443", "disable_oauth": true, "client_email": "", "private_key": "", "private_key_id": ""}' > "$GOOGLE_SERVICE_ACCOUNT"

And then run with

      GOOGLE_BUCKET: test-bucket
      GOOGLE_SERVICE_ACCOUNT: "/tmp/gcs.json"

Then

cargo test --features=aws,azure,gcp,http

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

Thank you very much for this report and proposed fix @jkosh44 -- it looks nice to me

In terms of testing, I think it would be ok if you:

  1. Manually validate that this fixes the issue with your actual GCP bucket
  2. Add some sort of unit test (that would fail if we broke this feature accidentally in a future refactor)

I don't think it is required to get this wired into the integration test harness as of now

@jkosh44
Copy link
Author

jkosh44 commented Mar 18, 2025

This is what we use for GCP integration

I actually found the GCP integration test, but this comment made me think that we can't actually test the multipart uploads (because they're not implemented in the emulator):

// Fake GCS server does not yet implement XML Multipart uploads
// https://github.com/fsouza/fake-gcs-server/issues/852
stream_get(&integration).await;

I'll work on manually validating the fix, but bizarrely I stopped receiving 411s from GCP and started getting 200s. I'll try to create a new bucket and see if I can get the 411 again.

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

I'll work on manually validating the fix, but bizarrely I stopped receiving 411s from GCP and started getting 200s. I'll try to create a new bucket and see if I can get the 411 again.

Weird!

Thank you.

@jkosh44
Copy link
Author

jkosh44 commented Mar 18, 2025

So I created a brand new bucket and ran the following test locally about 1000 times on main, and it passed every time.

use std::fs::File;
use std::io::Read;

use object_store::gcp::GoogleCloudStorageBuilder;
use object_store::multipart::MultipartStore;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let store = GoogleCloudStorageBuilder::new().with_bucket_name("multipart-complete-header-test").with_application_credentials("path/to/my/credentials.json").build().unwrap();
    let file_path = "path/to/some/local/file";
    let file_contents = {
        let mut buf = Vec::new();
        File::open(file_path)
            .unwrap()
            .read_to_end(&mut buf)
            .unwrap();
        buf
    };
    let blob_path = "multi-test/hello.txt".into();
    let id = store.create_multipart(&blob_path).await.unwrap();
    let mut part_ids = Vec::new();
    let part_id = store
        .put_part(&blob_path, &id, 0, file_contents.into())
        .await
        .unwrap();
    part_ids.push(part_id);
    store
        .complete_multipart(&blob_path, &id, part_ids)
        .await
        .unwrap();
    // Sleep to avoid rate limiting.
    tokio::time::sleep(tokio::time::Duration::from_millis(1000)).await;
    Ok(())
}

It does seem very odd that the docs clearly state that content-length is required, but we are not setting the content-length. My best guess as to why I was receiving a 411 is that GCP had a partial roll-out of new API servers that were validating content-length ... but that seems a bit far-fetched.

I did try the same test with this branch, and it also works fine. So I'll leave it up to you if you think it's worth merging. In terms of unit tests, is there somewhere where we are already unit testing multipart uploads? It seems difficult to test without refactoring s.t. we generate and execute the request in separate methods.

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

I did try the same test with this branch, and it also works fine. So I'll leave it up to you if you think it's worth merging. In terms of unit tests, is there somewhere where we are already unit testing multipart uploads? It seems difficult to test without refactoring s.t. we generate and execute the request in separate methods.

I think s3 has multipart upload tests -- but we would have to dig around to be sure

I'll plan to merge this PR in over the next few days unless anyone else has an objection

(BTW @jkosh44 I wonder if you need to make large multi-part requests -- like actually try to upload 10 parts or something)

@jkosh44
Copy link
Author

jkosh44 commented Mar 18, 2025

I think s3 has multipart upload tests -- but we would have to dig around to be sure

Thanks, I'll do some digging tomorrow.

I wonder if you need to make large multi-part requests -- like actually try to upload 10 parts or something

Good thinking, unfortunately I tried 50 parts each of size 10 MB and it still worked fine.

@tustvold
Copy link
Contributor

FWIW in the past the choice of SSL library has had an impact on this behaviour, I seem to remember a bug like this only being reproducible when using openssl instead of rustls

@jkosh44
Copy link
Author

jkosh44 commented Mar 19, 2025

FWIW I found an extremely similar issue in a similar repo: abdolence/gcloud-sdk-rs#121

@jkosh44
Copy link
Author

jkosh44 commented Mar 19, 2025

FWIW in the past the choice of SSL library has had an impact on this behaviour, I seem to remember a bug like this only being reproducible when using openssl instead of rustls

I'm not entirely sure how to validate what SSL library I'm using, but I made the following changes and re-ran my test but it still worked (the rt-multi-thread feature was needed for my test to run):

diff --git a/object_store/Cargo.toml b/object_store/Cargo.toml
index 8370cd53..ed73aeef 100644
--- a/object_store/Cargo.toml
+++ b/object_store/Cargo.toml
@@ -53,13 +53,14 @@ hyper = { version = "1.2", default-features = false, optional = true }
 md-5 = { version = "0.10.6", default-features = false, optional = true }
 quick-xml = { version = "0.37.0", features = ["serialize", "overlapped-lists"], optional = true }
 rand = { version = "0.8", default-features = false, features = ["std", "std_rng"], optional = true }
-reqwest = { version = "0.12", default-features = false, features = ["rustls-tls-native-roots", "http2"], optional = true }
+reqwest = { version = "0.12", default-features = false, features = ["native-tls", "http2"], optional = true }
 ring = { version = "0.17", default-features = false, features = ["std"], optional = true }
 rustls-pemfile = { version = "2.0", default-features = false, features = ["std"], optional = true }
 serde = { version = "1.0", default-features = false, features = ["derive"], optional = true }
 serde_json = { version = "1.0", default-features = false, features = ["std"], optional = true }
 serde_urlencoded = { version = "0.7", optional = true }
-tokio = { version = "1.29.0", features = ["sync", "macros", "rt", "time", "io-util"] }
+tokio = { version = "1.29.0", features = ["sync", "macros", "rt", "time", "io-util", "rt-multi-thread"] }
+openssl = "0.10"

@jkosh44
Copy link
Author

jkosh44 commented Mar 20, 2025

Hm, this is another interesting and potentially related issue from this repo: 50cf8bd8. Though, this happens when receiving a response. I wonder if there's some logic somewhere that optionally compresses the request and somehow causes issues with the content-length header?

@alamb
Copy link
Contributor

alamb commented Mar 20, 2025

I think I am going to assume this is basically "we have no visibility about what is going on the GCP side" and thus I am not going to worry why the error happens intermittently and non-reproducably.

Given that this change seems minimally risky and could plasubly fix an issue, I am just going to merge it in

Thank you for your diligence @jkosh44

@jkosh44
Copy link
Author

jkosh44 commented Mar 20, 2025

I think I am going to assume this is basically "we have no visibility about what is going on the GCP side" and thus I am not going to worry why the error happens intermittently and non-reproducably.

I agree with you there. I am still planning to track down that S3 test so we can add a similar one, but I got side-tracked with trying to repro the issue.

@jkosh44
Copy link
Author

jkosh44 commented Mar 20, 2025

I have a repro! It turns out that I was looking at the wrong method this whole time. The error happens when the multipart upload is empty.

Here's a minal repro which works on main:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
   let store = GoogleCloudStorageBuilder::new().with_bucket_name("multipart-complete-header-test").with_application_credentials("/path/to/credentials.json").build().unwrap();

    let blob_path = "multi-test/hello.txt".into();
    let id = store.create_multipart(&blob_path).await.unwrap();
    let part_ids = Vec::new();

    store
        .complete_multipart(&blob_path, &id, part_ids)
        .await
        .unwrap();

    Ok(())
}

Which results in the following error:

called `Result::unwrap()` on an `Err` value: Generic { store: "GCS", source: RetryError { method: PUT, uri: Some(https://storage.googleapis.com/multipart%2Dcomplete%2Dheader%2Dtest/multi%2Dtest%2Fhello%2Egzip), retries: 0, max_retries: 10, elapsed: 108.320687ms, retry_timeout: 180s, inner: Status { status: 411, body: Some("<!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 411 (Length Required)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>411.</b> <ins>That’s an error.</ins>\n  <p>POST requests require a <code>Content-length</code> header.  <ins>That’s all we know.</ins>\n") } } }
stack backtrace:
   0: rust_begin_unwind
             at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:692:5
   1: core::panicking::panic_fmt
             at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:75:14
   2: core::result::unwrap_failed
             at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/result.rs:1704:5
   3: core::result::Result<T,E>::unwrap
             at /home/joe.koshakow/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:1109:23
   4: object_store::main::{{closure}}
             at ./src/main.rs:35:5
   5: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/park.rs:284:60
   6: tokio::task::coop::with_budget
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:167:5
   7: tokio::task::coop::budget
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/task/coop/mod.rs:133:5
   8: tokio::runtime::park::CachedParkThread::block_on
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/park.rs:284:31
   9: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/blocking.rs:66:9
  10: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/multi_thread/mod.rs:87:13
  11: tokio::runtime::context::runtime::enter_runtime
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/context/runtime.rs:65:16
  12: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
  13: tokio::runtime::runtime::Runtime::block_on_inner
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:370:45
  14: tokio::runtime::runtime::Runtime::block_on
             at /home/joe.koshakow/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.44.1/src/runtime/runtime.rs:342:13
  15: object_store::main
             at ./src/main.rs:40:5
  16: core::ops::function::FnOnce::call_once
             at /home/joe.koshakow/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Process finished with exit code 101

The method in question is here:

if completed_parts.is_empty() {
// GCS doesn't allow empty multipart uploads
let result = self
.request(Method::PUT, path)
.idempotent(true)
.do_put()
.await?;

What's confusing is that is a PUT, but the error message said POST requests require a <code>Content-length</code> header which threw me off.

Adding .header(&CONTENT_LENGTH, "0") resolves the issue. Though it's not clear to me why we have the PUT call, we're about to delete the file via self.multipart_cleanup(path, multipart_id).await?;, any ideas?

@alamb
Copy link
Contributor

alamb commented Mar 20, 2025

I have a repro! It turns out that I was looking at the wrong method this whole time. The error happens when the multipart upload is empty.

Super Sleuthing! 🕵️

@jkosh44
Copy link
Author

jkosh44 commented Mar 20, 2025

@tustvold it looks like you actually added the empty upload handling, https://github.com/apache/arrow-rs/pull/5590/files. Any thoughts on why the PUT is needed? From some local testing, the following also seems to work (i.e. remove the PUT):

        if completed_parts.is_empty() {
            // GCS doesn't allow empty multipart uploads
            self.multipart_cleanup(path, multipart_id).await?;
            return Ok(PutResult{ e_tag: None, version: None });
        }

but I'm not really familiar enough with this API to know if that break's something.

@tustvold
Copy link
Contributor

That changes the behaviour from uploading an empty object to uploading nothing. The empty part is necessary because otherwise it gets rejected - see the linked issue apache/arrow-rs-object-store#91

This commit fixes the GCP mulipart complete implementation by adding
the Content-Length header to XML requests.

According to the docs,
https://cloud.google.com/storage/docs/xml-api/post-object-complete,
this header is required in the complete POST request, but wasn't being
set previously. It seems like GCP doesn't actually validate this
header, but it's better to set it in case they validate in the future.

Additionally, GCP is strict about setting the Content-Length header on
requests with empty bodies, so we also update an empty PUT request,
https://cloud.google.com/storage/docs/xml-api/put-object-multipart,
with the header.
@jkosh44 jkosh44 force-pushed the gcp-multipart-complete branch from 3c3cf1d to ab06852 Compare March 20, 2025 16:52
@jkosh44
Copy link
Author

jkosh44 commented Mar 20, 2025

That changes the behaviour from uploading an empty object to uploading nothing. The empty part is necessary because otherwise it gets rejected - see the linked issue apache/arrow-rs-object-store#91

My understanding is that the POST for complete (https://cloud.google.com/storage/docs/xml-api/post-object-complete) will get rejected if there's 0 parts, but the DELETE (https://cloud.google.com/storage/docs/xml-api/delete-multipart) will get accepted. The linked issue and PR was from before we were calling multipart_cleanup, which submits the DELETE, and we were always submitting the POST. It would seem like a bug on the GCP side to force you to upload an empty part before allowing you to aborting the upload, but it does seem reasonable to force you to upload at least one part to complete it successfully.

Still the empty upload isn't hurting anything so it's probably safer to just keep it for now. I've updated it to include the Content-Length header.

I've also kept the content length header in the POST, even though it seems clear to me now that it's never validated. The docs say it's required, so we might as well set it. I'm happy to revert though if people disagree.

EDIT: I just realized, apache/arrow-rs-object-store#91 is for AWS not GCP. So is what you're saying that we'd like to keep the behavior consistent across cloud providers? It's already slightly different in that AWS completes empty uploads while GCP aborts them.

@tustvold
Copy link
Contributor

The code is performing a regular PUT, i.e. uploading a single empty object not an upload part, and then aborting the multipart upload.

@@ -540,6 +541,7 @@ impl GoogleCloudStorageClient {
let response = self
.client
.request(Method::POST, &url)
.header(&CONTENT_LENGTH, data.len())
Copy link
Contributor

@tustvold tustvold Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be unnecessary, and I believe is potentially incorrect if the client decides to use a some sort of transport encoding

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@@ -517,6 +517,7 @@ impl GoogleCloudStorageClient {
// GCS doesn't allow empty multipart uploads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could perhaps make this more clear by simply calling self.put directly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea, but self.put also doesn't set Content-length so we'd have the same issue. I could update self.put to always set the Content-Length if the payload is empty. That would have a larger blast radius though. Thoughts?

@alamb
Copy link
Contributor

alamb commented Mar 20, 2025

Thank you for this PR. We are in the process of moving the object_store code to its own repository. Would it be possible for you to create a PR in that repository instead?

(we will handle moving all existing issues to the new repository)

@jkosh44
Copy link
Author

jkosh44 commented Mar 20, 2025

The code is performing a regular PUT, i.e. uploading a single empty object not an upload part, and then aborting the multipart upload.

Oh I see now, so the end result is the file exists in GCP but is empty. That makes sense. Thanks for clarifying that for me.

@jkosh44
Copy link
Author

jkosh44 commented Mar 20, 2025

Thank you for this PR. We are in the process of moving the object_store code to its own repository. Would it be possible for you to create a PR in that repository instead?

* See details on [[EPIC] Port object_store content from arrow-rs repository arrow-rs-object-store#2](https://github.com/apache/arrow-rs-object-store/issues/2)

* Backstory on [[DISCUSSION] Proposal move `object_store` to its own github repo? apache/arrow-rs#6183](https://github.com/apache/arrow-rs/issues/6183)

(we will handle moving all existing issues to the new repository)

Here is the new PR: apache/arrow-rs-object-store#257

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants