-
Notifications
You must be signed in to change notification settings - Fork 36
feat: support list with ListOpts which contains the offset, max_keys,… #324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I'll try to find some time to review this over the weekend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have done a first pass of a review, but unfortunately have run out of time for today.
I wonder if we might be able to find someway to reduce the size of this diff, its really quite hard to review at the moment... Perhaps we could split adding list_opts and adding the max-keys into separate PRs?
I also think we should try to make this not a breaking change, as it is unlikely we're going to look to make a breaking release in the next 4 or so months.
src/aws/builder.rs
Outdated
@@ -891,6 +893,12 @@ impl AmazonS3Builder { | |||
self | |||
} | |||
|
|||
/// Set the max keys per list request. It's almost used for test paginated listing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This functionality never appears to be used or tested, is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'm trying to test paginated listing, for example, limit n objects per list request instead of preparing more than 1k objects before testing in ut, but i found it's a little bit hard to add a general test case in integration.rs
because they used a common object store, e.g. AmazonS3, otherwise I need to create separate store and test cover it. It can be removed.
/// | ||
/// Prefixes are evaluated on a path segment basis, i.e. `foo/bar` is a prefix of `foo/bar/x` but not of | ||
/// `foo/bar_baz/x`. List is recursive, i.e. `foo/bar/more/x` will be included. | ||
fn list_opts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This formulation is a breaking change, if we want to get this in any time in the next few months we probably want to instead provide a default implementation that calls through to list - erroring if not provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for I didn't get "this is a breaking change" here, do you mean maybe some other crates import the object_store crate and implement the ObjectStore traits, they have to implement the list_opts
method if they upgrade the object_store version, right? If it is, it might be problem, provide a default implementation is a better option, but it also means we cannot remove other list methods to make this trait more concise.
I planed marked other list methods as deprecated, it seems that it's better to do it in a big version.
src/prefix.rs
Outdated
}; | ||
let s = self.inner.list_opts(Some(&prefix), opts); | ||
|
||
s.map_ok(move |lst| ListResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could encapsulate this, it is identical to list_with_delimiter below
if !k.as_ref().starts_with(prefix.as_ref()) { | ||
break; | ||
} | ||
|
||
if k <= offset { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead just start listing from offset
instead of prefix
if provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, maybe start listing from the bigger one between offset
and prefix
, because technically offset
could be less than prefix
.
src/memory.rs
Outdated
// Only objects in this base level should be returned in the | ||
// response. Otherwise, we just collect the common prefixes. | ||
let mut objects = vec![]; | ||
for (k, v) in self.storage.read().map.range((prefix)..) { | ||
if let Some(may_keys) = max_keys { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could lift the conditional out by doing let max_keys = max_keys.unwrap_or(usize::MAX)
.
src/lib.rs
Outdated
|
||
while let Some(result) = stream.next().await { | ||
let response = result?; | ||
common_prefixes.extend(response.common_prefixes.into_iter()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this potentially result in the same common_prefix appearing multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could be, but the BTreeSet should deduplicate the common_prefixes.
src/memory.rs
Outdated
let objects = match max_keys { | ||
Some(max_keys) if max_keys < values.len() => { | ||
values.truncate(max_keys); | ||
values | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead use Stream::take
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will try
src/client/list.rs
Outdated
@@ -33,6 +32,8 @@ pub(crate) trait ListClient: Send + Sync + 'static { | |||
delimiter: bool, | |||
token: Option<&str>, | |||
offset: Option<&str>, | |||
max_keys: Option<usize>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this could use ListOpts?
Thanks a lot for got lots of useful suggestions from you, i will split the this diff and also avoid the breaking change, thanks again. |
Hi @tustvold, I tried to reduce the scope of this PR, could you please help to review it when you free, thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done another pass, but again run out of time I have to spend on this.
As it stands this PR is really quite hard to review, it makes lots of changes to lots of places without it being very easy to connect things together.
I'll see if I can't find some time to bash out a smaller more incremental version of this PR in the next couple of weeks, but unfortunately my time is very limited at the moment.
src/lib.rs
Outdated
// Add the default implementation via calling list_with_offset methods avoid bring | ||
// a breaking change, the default implementation and other list methods will be removed | ||
// in the future. All list request should call this method. | ||
use std::collections::BTreeSet; | ||
|
||
let prefix = prefix.cloned().unwrap_or_default(); | ||
let offset = options.offset.unwrap_or(prefix.clone()); | ||
self.list_with_offset(Some(&prefix), &offset) | ||
.try_chunks(1000) | ||
.map(move |batch| { | ||
let batch = batch.map_err(|e| e.1)?; | ||
if options.delimiter { | ||
let mut common_prefixes = BTreeSet::new(); | ||
let mut objects = Vec::new(); | ||
for obj in batch.into_iter() { | ||
let location = obj.location.clone(); | ||
let mut parts = match location.prefix_match(&prefix) { | ||
Some(parts) => parts, | ||
None => continue, | ||
}; | ||
|
||
match parts.next() { | ||
None => {} | ||
Some(p) => match parts.next() { | ||
None => objects.push(obj), | ||
Some(_) => { | ||
let common_prefix = prefix.child(p); | ||
if common_prefix > offset { | ||
common_prefixes.insert(common_prefix); | ||
} | ||
} | ||
}, | ||
} | ||
drop(parts) | ||
} | ||
Ok(ListResult { | ||
common_prefixes: common_prefixes.into_iter().collect(), | ||
objects, | ||
}) | ||
} else { | ||
Ok(ListResult { | ||
common_prefixes: vec![], | ||
objects: batch, | ||
}) | ||
} | ||
}) | ||
.boxed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Add the default implementation via calling list_with_offset methods avoid bring | |
// a breaking change, the default implementation and other list methods will be removed | |
// in the future. All list request should call this method. | |
use std::collections::BTreeSet; | |
let prefix = prefix.cloned().unwrap_or_default(); | |
let offset = options.offset.unwrap_or(prefix.clone()); | |
self.list_with_offset(Some(&prefix), &offset) | |
.try_chunks(1000) | |
.map(move |batch| { | |
let batch = batch.map_err(|e| e.1)?; | |
if options.delimiter { | |
let mut common_prefixes = BTreeSet::new(); | |
let mut objects = Vec::new(); | |
for obj in batch.into_iter() { | |
let location = obj.location.clone(); | |
let mut parts = match location.prefix_match(&prefix) { | |
Some(parts) => parts, | |
None => continue, | |
}; | |
match parts.next() { | |
None => {} | |
Some(p) => match parts.next() { | |
None => objects.push(obj), | |
Some(_) => { | |
let common_prefix = prefix.child(p); | |
if common_prefix > offset { | |
common_prefixes.insert(common_prefix); | |
} | |
} | |
}, | |
} | |
drop(parts) | |
} | |
Ok(ListResult { | |
common_prefixes: common_prefixes.into_iter().collect(), | |
objects, | |
}) | |
} else { | |
Ok(ListResult { | |
common_prefixes: vec![], | |
objects: batch, | |
}) | |
} | |
}) | |
.boxed() | |
// Add the default implementation via calling list_with_offset methods avoid bring | |
// a breaking change, the default implementation will be removed, and default | |
// implementations added for the other list methods using it | |
futures::stream::once(futures::future::ready(Err(Error::NotImplemented))).boxed() |
I don't think there is a coherent way to provide a default implementation here, the below ignores extensions
and doesn't push down the delimiter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, throw NotImplemented
error is a simple way,
src/lib.rs
Outdated
@@ -1283,6 +1359,22 @@ pub struct PutResult { | |||
pub version: Option<String>, | |||
} | |||
|
|||
/// Options for [`ObjectStore::list_opts`] | |||
#[derive(Debug, Clone, Default)] | |||
pub struct ListOpts { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pub struct ListOpts { | |
pub struct ListOptions { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, followed the pattern of PutMultipartOpts
src/lib.rs
Outdated
/// | ||
/// They are also eclused from [`PartialEq`] and [`Eq`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// | |
/// They are also eclused from [`PartialEq`] and [`Eq`]. |
I appreciate this is copied, but it is also not true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, will remove the same comments
offset: &Path, | ||
) -> BoxStream<'static, Result<ObjectMeta>>; | ||
|
||
async fn list_with_delimiter(&self, prefix: Option<&Path>) -> Result<ListResult>; | ||
} | ||
|
||
#[async_trait] | ||
impl<T: ListClient + Clone> ListClientExt for T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a list_opts impl here, to avoid duplicating the plumbing logic in the 3 cloud provider stores
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean add a list_opts
and then calling list_paginated
within list_opts
impl, and the 3 clould provider stores call self.client.list_opts(prefix, options)
? like this:
fn list_opts(
&self,
prefix: Option<&Path>,
opts: ListOpts,
) -> BoxStream<'static, Result<ListResult>> {
let client = self.clone();
client.list_paginated(prefix, opts)
}
The 3 clould provider are calling list_paginated
method directly during implementing list_opts
of ObjectStore, because the list_opts
of ListClientExt
will be very simple, just transfer request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure I follow what you are saying, but if list_paginated ends up with the same signature as list_opts, it might as well just get renamed to list_opts.
@@ -157,6 +158,93 @@ impl ObjectStore for HttpStore { | |||
.boxed() | |||
} | |||
|
|||
fn list_opts( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation seems over complicated given that webdav doesn't support pagination.
Could we perhaps reuse more of the existing logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i considered reuse the logic of list
and list_with_delimiter
, but since the response of list_opts
is a BoxStream with static lifecycle, so cannot reuse list_with_delimiter
directly(please correct me if misunderstand something).
@@ -960,6 +960,102 @@ pub async fn list_with_delimiter(storage: &DynObjectStore) { | |||
assert!(content_list.is_empty()); | |||
} | |||
|
|||
/// Tests listing with composite conditions by [`ObjectStore::list_opts`] | |||
pub async fn list_with_composite_conditions(storage: &DynObjectStore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't appear to be wired into all of the integration tests?
src/lib.rs
Outdated
/// Options for [`ObjectStore::list_opts`] | ||
#[derive(Debug, Clone, Default)] | ||
pub struct ListOpts { | ||
/// The listed objects whose locations are greater than `offset` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// The listed objects whose locations are greater than `offset` | |
/// Only list objects greater than `offset` |
src/lib.rs
Outdated
pub struct ListOpts { | ||
/// The listed objects whose locations are greater than `offset` | ||
pub offset: Option<Path>, | ||
/// True means returns common prefixes (directories) in addition to object in [`ListResult`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// True means returns common prefixes (directories) in addition to object in [`ListResult`] | |
/// Only list up to the next path delimiter `/` [`ListResult`] | |
/// | |
/// See [`ObjectStore::list_with_delimiter`] |
I'm so sorry that this PR has taken up too much of your time. At the same time, I'm extremely grateful for your very constructive suggestions. I have already revised the code and made the entire PR simplify as much as possible. I hope this can reduce the difficulty of the review. Thanks again for your help. |
#371 contains an alternative proposal to add support for this |
… delimiter
Which issue does this PR close?
Closes #295 and #291.
Rationale for this change
What changes are included in this PR?
list_opts
method inObjectStore
traits and implement thelist_opts
method in each implementation.Are there any user-facing changes?
deprecated