Skip to content

Port object_store integration tests, use github actions #2148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jul 25, 2022
114 changes: 114 additions & 0 deletions .github/workflows/object_store.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

---
name: "Object Store"

on:
pull_request:
paths:
# Only run when object store files or github workflows change
- object_store/**
- .github/**

jobs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clippy:
name: Clippy
runs-on: ubuntu-latest
container:
image: amd64/rust
steps:
- uses: actions/checkout@v2
- name: Setup Rust toolchain with clippy
run: |
rustup toolchain install stable
rustup default stable
rustup component add clippy
- name: Run clippy
run: |
cargo clippy -p object_store --all-features

# test the crate
linux-test:
name: Emulator Tests
runs-on: ubuntu-latest
services:
fake-gcs:
image: fsouza/fake-gcs-server
ports:
- 4443:4443
localstack:
image: localstack/localstack:0.14.4
ports:
- 4566:4566
azurite:
image: mcr.microsoft.com/azure-storage/azurite
ports:
- 10000:10002
container:
image: amd64/rust
env:
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
# https://github.com/rust-lang/cargo/issues/10280
CARGO_NET_GIT_FETCH_WITH_CLI: "true"
RUST_BACKTRACE: "1"
# Run integration tests
TEST_INTEGRATION: 1
AWS_DEFAULT_REGION: "us-east-1"
AWS_ACCESS_KEY_ID: test
AWS_SECRET_ACCESS_KEY: test
AWS_ENDPOINT: http://localstack:4566
AZURE_USE_EMULATOR: "1"
AZURITE_BLOB_STORAGE_URL: "http://azurite:10000"
AZURITE_QUEUE_STORAGE_URL: "http://azurite:10001"
GOOGLE_SERVICE_ACCOUNT: "/tmp/gcs.json"
OBJECT_STORE_BUCKET: test-bucket

steps:
- uses: actions/checkout@v2

- name: Configure Fake GCS Server (GCP emulation)
run: |
curl --insecure -v -X POST --data-binary '{"name":"test-bucket"}' -H "Content-Type: application/json" "https://fake-gcs:4443/storage/v1/b"
echo '{"gcs_base_url": "https://fake-gcs:4443", "disable_oauth": true, "client_email": "", "private_key": ""}' > "$GOOGLE_SERVICE_ACCOUNT"

- name: Setup LocalStack (AWS emulation)
run: |
cd /tmp
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
./aws/install
aws --endpoint-url=http://localstack:4566 s3 mb s3://test-bucket

- name: Configure Azurite (Azure emulation)
# the magical connection string is from
# https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azurite?tabs=visual-studio#http-connection-strings
run: |
curl -sL https://aka.ms/InstallAzureCLIDeb | bash
az storage container create -n test-bucket --connection-string 'DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://azurite:10000/devstoreaccount1;QueueEndpoint=http://azurite:10001/devstoreaccount1;'

- name: Setup Rust toolchain
run: |
rustup toolchain install stable
rustup default stable

- name: Run object_store tests
run: |
# run tests
cargo test -p object_store --features=aws,azure,azure_test,gcp
23 changes: 21 additions & 2 deletions object_store/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,5 +90,24 @@ $ cargo test --features azure

### GCP

We don't have a good story yet for testing the GCP integration locally. You will need to create a GCS bucket, a
service account that has access to it, and use this to run the tests.
To test the GCS integration, we use [Fake GCS Server](https://github.com/fsouza/fake-gcs-server)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation on how to run these integration tests


Startup the fake server:

```shell
docker run -p 4443:4443 fsouza/fake-gcs-server
```

Configure the account:
```shell
curl --insecure -v -X POST --data-binary '{"name":"test-bucket"}' -H "Content-Type: application/json" "https://localhost:4443/storage/v1/b"
echo '{"gcs_base_url": "https://localhost:4443", "disable_oauth": true, "client_email": "", "private_key": ""}' > /tmp/gcs.json
```

Now run the tests:
```shell
TEST_INTEGRATION=1 \
OBJECT_STORE_BUCKET=test-bucket \
GOOGLE_SERVICE_ACCOUNT=/tmp/gcs.json \
cargo test -p object_store --features=gcp
```
50 changes: 49 additions & 1 deletion object_store/src/azure.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ use futures::{
use snafu::{ResultExt, Snafu};
use std::collections::BTreeSet;
use std::{convert::TryInto, sync::Arc};
use url::Url;

/// A specialized `Error` for Azure object store-related errors
#[derive(Debug, Snafu)]
Expand Down Expand Up @@ -158,6 +159,18 @@ enum Error {
"Azurite (azure emulator) support not compiled in, please add `azure_test` feature"
))]
NoEmulatorFeature,

#[snafu(display(
"Unable parse emulator url {}={}, Error: {}",
env_name,
env_value,
source
))]
UnableToParseEmulatorUrl {
env_name: String,
env_value: String,
source: url::ParseError,
},
}

impl From<Error> for super::Error {
Expand Down Expand Up @@ -507,6 +520,21 @@ fn check_if_emulator_works() -> Result<()> {
Err(Error::NoEmulatorFeature.into())
}

/// Parses the contents of the environment variable `env_name` as a URL
/// if present, otherwise falls back to default_url
fn url_from_env(env_name: &str, default_url: &str) -> Result<Url> {
let url = match std::env::var(env_name) {
Ok(env_value) => {
Url::parse(&env_value).context(UnableToParseEmulatorUrlSnafu {
env_name,
env_value,
})?
}
Err(_) => Url::parse(default_url).expect("Failed to parse default URL"),
};
Ok(url)
}

/// Configure a connection to container with given name on Microsoft Azure
/// Blob store.
///
Expand All @@ -524,7 +552,27 @@ pub fn new_azure(

let (is_emulator, storage_account_client) = if use_emulator {
check_if_emulator_works()?;
(true, StorageAccountClient::new_emulator_default())
// Allow overriding defaults. Values taken from
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required to specify a URL different than 127.0.0.1 -- the github actions "services" run with some actual dns name rather than on the local loopback (e.g. you need to specify http://auzure:10000 rather than http://127.0.0.1:10000)

// from https://docs.rs/azure_storage/0.2.0/src/azure_storage/core/clients/storage_account_client.rs.html#129-141
let http_client = azure_core::new_http_client();
let blob_storage_url =
url_from_env("AZURITE_BLOB_STORAGE_URL", "http://127.0.0.1:10000")?;
let queue_storage_url =
url_from_env("AZURITE_QUEUE_STORAGE_URL", "http://127.0.0.1:10001")?;
let table_storage_url =
url_from_env("AZURITE_TABLE_STORAGE_URL", "http://127.0.0.1:10002")?;
let filesystem_url =
url_from_env("AZURITE_TABLE_STORAGE_URL", "http://127.0.0.1:10004")?;

let storage_client = StorageAccountClient::new_emulator(
http_client,
&blob_storage_url,
&table_storage_url,
&queue_storage_url,
&filesystem_url,
);

(true, storage_client)
} else {
(
false,
Expand Down
34 changes: 27 additions & 7 deletions object_store/src/gcp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -502,9 +502,17 @@ fn reader_credentials_file(
pub fn new_gcs(
service_account_path: impl AsRef<std::path::Path>,
bucket_name: impl Into<String>,
) -> Result<GoogleCloudStorage> {
new_gcs_with_client(service_account_path, bucket_name, Client::new())
}

/// Configure a connection to Google Cloud Storage with the specified HTTP client.
pub fn new_gcs_with_client(
service_account_path: impl AsRef<std::path::Path>,
bucket_name: impl Into<String>,
client: Client,
) -> Result<GoogleCloudStorage> {
let credentials = reader_credentials_file(service_account_path)?;
let client = Client::new();

// TODO: https://cloud.google.com/storage/docs/authentication#oauth-scopes
let scope = "https://www.googleapis.com/auth/devstorage.full_control";
Expand Down Expand Up @@ -575,6 +583,18 @@ mod test {
service_account: String,
}

impl GoogleCloudConfig {
fn build_test(self) -> Result<GoogleCloudStorage> {
// ignore HTTPS errors in tests so we can use fake-gcs server
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is only done in the tests, whereas the original approach allowed it in production as well)

let client = Client::builder()
.danger_accept_invalid_certs(true)
.build()
.expect("Error creating http client for testing");

new_gcs_with_client(self.service_account, self.bucket, client)
}
}

// Helper macro to skip tests if TEST_INTEGRATION and the GCP environment variables are not set.
macro_rules! maybe_skip_integration {
() => {{
Expand Down Expand Up @@ -622,7 +642,7 @@ mod test {
#[tokio::test]
async fn gcs_test() {
let config = maybe_skip_integration!();
let integration = new_gcs(config.service_account, config.bucket).unwrap();
let integration = config.build_test().unwrap();

put_get_delete_list(&integration).await.unwrap();
list_uses_directories_correctly(&integration).await.unwrap();
Expand All @@ -633,7 +653,7 @@ mod test {
#[tokio::test]
async fn gcs_test_get_nonexistent_location() {
let config = maybe_skip_integration!();
let integration = new_gcs(config.service_account, &config.bucket).unwrap();
let integration = config.build_test().unwrap();

let location = Path::from_iter([NON_EXISTENT_NAME]);

Expand All @@ -650,7 +670,7 @@ mod test {
async fn gcs_test_get_nonexistent_bucket() {
let mut config = maybe_skip_integration!();
config.bucket = NON_EXISTENT_NAME.into();
let integration = new_gcs(config.service_account, &config.bucket).unwrap();
let integration = config.build_test().unwrap();

let location = Path::from_iter([NON_EXISTENT_NAME]);

Expand All @@ -668,7 +688,7 @@ mod test {
#[tokio::test]
async fn gcs_test_delete_nonexistent_location() {
let config = maybe_skip_integration!();
let integration = new_gcs(config.service_account, &config.bucket).unwrap();
let integration = config.build_test().unwrap();

let location = Path::from_iter([NON_EXISTENT_NAME]);

Expand All @@ -684,7 +704,7 @@ mod test {
async fn gcs_test_delete_nonexistent_bucket() {
let mut config = maybe_skip_integration!();
config.bucket = NON_EXISTENT_NAME.into();
let integration = new_gcs(config.service_account, &config.bucket).unwrap();
let integration = config.build_test().unwrap();

let location = Path::from_iter([NON_EXISTENT_NAME]);

Expand All @@ -700,7 +720,7 @@ mod test {
async fn gcs_test_put_nonexistent_bucket() {
let mut config = maybe_skip_integration!();
config.bucket = NON_EXISTENT_NAME.into();
let integration = new_gcs(config.service_account, &config.bucket).unwrap();
let integration = config.build_test().unwrap();

let location = Path::from_iter([NON_EXISTENT_NAME]);
let data = Bytes::from("arbitrary data");
Expand Down
4 changes: 2 additions & 2 deletions object_store/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -695,8 +695,8 @@ mod tests {
#[tokio::test]
async fn test_list_lifetimes() {
let store = memory::InMemory::new();
let stream = list_store(&store, "path").await.unwrap();
assert_eq!(stream.count().await, 0);
let mut stream = list_store(&store, "path").await.unwrap();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why, but locally the stream.count() construction started erroring like this for me:

cd /Users/alamb/Software/arrow-rs2 && RUST_LOG=trace RUST_BACKTRACE=1 CARGO_TARGET_DIR=/Users/alamb/Software/df-target2 cargo test -p object_store --features=aws,azure,azure_test,gcp
   Compiling object_store v0.3.0 (/Users/alamb/Software/arrow-rs2/object_store)
error[E0599]: `Pin<Box<dyn futures::Stream<Item = std::result::Result<ObjectMeta, Error>> + std::marker::Send>>` is not an iterator
   --> object_store/src/lib.rs:699:27
    |
699 |         assert_eq!(stream.count().await, 0);
    |                           ^^^^^ `Pin<Box<dyn futures::Stream<Item = std::result::Result<ObjectMeta, Error>> + std::marker::Send>>` is not an iterator
    |
   ::: /Users/alamb/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/pin.rs:408:1
    |
408 | pub struct Pin<P> {
    | ----------------- doesn't satisfy `_: Iterator`
    |
   ::: /Users/alamb/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.17/src/stream.rs:27:1
    |
27  | pub trait Stream {
    | ---------------- doesn't satisfy `_: Iterator`
    |
    = note: the following trait bounds were not satisfied:
            `Pin<Box<dyn futures::Stream<Item = std::result::Result<ObjectMeta, Error>> + std::marker::Send>>: Iterator`
            which is required by `&mut Pin<Box<dyn futures::Stream<Item = std::result::Result<ObjectMeta, Error>> + std::marker::Send>>: Iterator`
            `dyn futures::Stream<Item = std::result::Result<ObjectMeta, Error>> + std::marker::Send: Iterator`
            which is required by `&mut dyn futures::Stream<Item = std::result::Result<ObjectMeta, Error>> + std::marker::Send: Iterator`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like the StreamExt is not in scope for some reason

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried a few ways to get that to work, but rustc gave me the same error and told me that StreamExt was unused ...

assert!(stream.next().await.is_none());
}

// Tests TODO:
Expand Down