-
Notifications
You must be signed in to change notification settings - Fork 283
Migrate storage core and data_lake to pipeline architecture #584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. This PR is huge which makes it really hard to review properly. I wonder if breaking this up into smaller PRs might be a better approach. Thoughts?
|
||
match &self.storage_credentials { | ||
StorageCredentials::Key(account, key) => { | ||
let url = url::Url::parse(&request.uri().to_string()).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward. I wonder if there's a better way to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is actually a point I stumbled across when developing. Right now the SAS authorization does not work via the authorization policy, since I could not figure out a straight forward way to mutate the url within the policy. I guess the same applies here. So any ideas to mitigate this would be great.
@@ -0,0 +1,256 @@ | |||
use crate::clients::{ServiceType, StorageCredentials}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume most of this is being worked on in #568, right? It makes it hard to review when the changes are mixed together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I only saw the PR after submitting this. WHile you are absolutely right, my main challenge was that I needed to be able to use master key auth to get the tests to work so I found no straight forward way to separate this when migrating the core crate. But we definitely need to align how that feature gets merged. (@thovoll)
} | ||
} | ||
|
||
/// Create a Pipeline from CosmosOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix - even though most stuff is heavily inspired by cosmos :)
As for splitting up this PR, I'd be happy to. However not sure how :). The bulk of changes relates to how the clients are initialized and removing the lifetime identifiers from the request properties - once that is factored out I think its more managable :). But if there is some ideas how to split this up to make it more review friendly I'd be happy to do that. |
Thinking about how to split this us up, am I even correct in assuming, that removing the livetime specifiers from request options is the only way to get But none of these options seem really attractive to me, but I'll follow your advice on this! |
I factored out one part into another PR (#586) - its still a lot of files, but always the same change ... |
I'm going to close this since @roeap is breaking this up into many pieces. |
related: #496
This PR ended up being much larger then I had hoped, but I still hope its a step in the right direction :).
I have been coordinating with @thovoll on improving the Azure integration of
delta-rs
. Over there we are greatly benefiting from the migration to the pipelines architecture.All changes are modelled on the the
list_databases
andcreate_database
function from the cosmos crate. Specifically using theinto_stream
andinto_future
pattern. This is where a lot of the changes in other storage crates come from. To get this to work, I had to remove some lifetimes form request properties used also in other storage crates. My hope is that once this is applied to the blob crate as well, it might solve #377 as well.Aside from re-shaping all the requests / operations there are some notable changes to the clients as well.
Rather then always referencing the "parent" client, the data lake and file system clients now use the storage account client as inner client. This is also no longer an
Arc
, which is a pattern carried over from cosmos.The
AuthorizationPolicy
from the datalake crate has been moved to the storage core crate, and extended to support more authorization schemes.I am still fairly new to rust and reading this codebase has been a great learning experience! Hopefully I was able to produce some useful code changes that can eventually be merged :).
Any feedback is greatly appreciated!