Skip to content

Migrate storage core and data_lake to pipeline architecture #584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

roeap
Copy link
Contributor

@roeap roeap commented Jan 9, 2022

related: #496

This PR ended up being much larger then I had hoped, but I still hope its a step in the right direction :).

I have been coordinating with @thovoll on improving the Azure integration of delta-rs. Over there we are greatly benefiting from the migration to the pipelines architecture.

All changes are modelled on the the list_databases and create_database function from the cosmos crate. Specifically using the into_stream and into_future pattern. This is where a lot of the changes in other storage crates come from. To get this to work, I had to remove some lifetimes form request properties used also in other storage crates. My hope is that once this is applied to the blob crate as well, it might solve #377 as well.

Aside from re-shaping all the requests / operations there are some notable changes to the clients as well.

Rather then always referencing the "parent" client, the data lake and file system clients now use the storage account client as inner client. This is also no longer an Arc, which is a pattern carried over from cosmos.

The AuthorizationPolicy from the datalake crate has been moved to the storage core crate, and extended to support more authorization schemes.

I am still fairly new to rust and reading this codebase has been a great learning experience! Hopefully I was able to produce some useful code changes that can eventually be merged :).

Any feedback is greatly appreciated!

@ghost
Copy link

ghost commented Jan 9, 2022

CLA assistant check
All CLA requirements met.

Copy link
Contributor

@rylev rylev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this. This PR is huge which makes it really hard to review properly. I wonder if breaking this up into smaller PRs might be a better approach. Thoughts?


match &self.storage_credentials {
StorageCredentials::Key(account, key) => {
let url = url::Url::parse(&request.uri().to_string()).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit awkward. I wonder if there's a better way to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually a point I stumbled across when developing. Right now the SAS authorization does not work via the authorization policy, since I could not figure out a straight forward way to mutate the url within the policy. I guess the same applies here. So any ideas to mitigate this would be great.

@@ -0,0 +1,256 @@
use crate::clients::{ServiceType, StorageCredentials};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume most of this is being worked on in #568, right? It makes it hard to review when the changes are mixed together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I only saw the PR after submitting this. WHile you are absolutely right, my main challenge was that I needed to be able to use master key auth to get the tests to work so I found no straight forward way to separate this when migrating the core crate. But we definitely need to align how that feature gets merged. (@thovoll)

}
}

/// Create a Pipeline from CosmosOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is wrong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix - even though most stuff is heavily inspired by cosmos :)

@roeap
Copy link
Contributor Author

roeap commented Jan 10, 2022

As for splitting up this PR, I'd be happy to. However not sure how :). The bulk of changes relates to how the clients are initialized and removing the lifetime identifiers from the request properties - once that is factored out I think its more managable :). But if there is some ideas how to split this up to make it more review friendly I'd be happy to do that.

@roeap
Copy link
Contributor Author

roeap commented Jan 10, 2022

Thinking about how to split this us up, am I even correct in assuming, that removing the livetime specifiers from request options is the only way to get into_stream and into_future to work? This relates mostly to the changes in sdk/core/src/request_options/prefix.rs. If we can keep these around, I would need some guidance an how to implement above mentioned functions, but a lot of changes could be reverted. If that is the case, then I could make "only" that change first, which should cover some 10s of files, but all changes look the same. The second large chunk is the changed initialization for the storage account client class - I could introduce an intermediate constructor to have the old signature still working. After that we would likely be left with just a handful of files to review.

But none of these options seem really attractive to me, but I'll follow your advice on this!

@roeap
Copy link
Contributor Author

roeap commented Jan 11, 2022

I factored out one part into another PR (#586) - its still a lot of files, but always the same change ...

@rylev
Copy link
Contributor

rylev commented Jan 11, 2022

I'm going to close this since @roeap is breaking this up into many pieces.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants