Skip to content

feat: support list with ListOpts which contains the offset, max_keys,… #324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

stayrascal
Copy link

@stayrascal stayrascal commented Apr 7, 2025

… delimiter

Which issue does this PR close?

Closes #295 and #291.

Rationale for this change

What changes are included in this PR?

  • add a list_opts method in ObjectStore traits and implement the list_opts method in each implementation.
  • the ListOptions support offset, delimiter, extension(maybe we can include the prefix as a option into ListOptions, but in order to keep the style from other methods, keep prefix as a alone parameter currently)
  • add a list_with_composite_conditions test case by compose prefix, offset, delimiter

Are there any user-facing changes?

  • TBD: whether mark these methods: list, list_with_offset, list_with_delimiter as deprecated

@stayrascal
Copy link
Author

Hi @tustvold, I have submitted a PR about #295, but it seems that the CI/workflow doesn't been triggered, need ti be approved, could you please help to take a look, and help to review this PR? thx.

@tustvold
Copy link
Contributor

tustvold commented Apr 8, 2025

I'll try to find some time to review this over the weekend

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have done a first pass of a review, but unfortunately have run out of time for today.

I wonder if we might be able to find someway to reduce the size of this diff, its really quite hard to review at the moment... Perhaps we could split adding list_opts and adding the max-keys into separate PRs?

I also think we should try to make this not a breaking change, as it is unlikely we're going to look to make a breaking release in the next 4 or so months.

@@ -891,6 +893,12 @@ impl AmazonS3Builder {
self
}

/// Set the max keys per list request. It's almost used for test paginated listing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality never appears to be used or tested, is it necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i'm trying to test paginated listing, for example, limit n objects per list request instead of preparing more than 1k objects before testing in ut, but i found it's a little bit hard to add a general test case in integration.rs because they used a common object store, e.g. AmazonS3, otherwise I need to create separate store and test cover it. It can be removed.

///
/// Prefixes are evaluated on a path segment basis, i.e. `foo/bar` is a prefix of `foo/bar/x` but not of
/// `foo/bar_baz/x`. List is recursive, i.e. `foo/bar/more/x` will be included.
fn list_opts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This formulation is a breaking change, if we want to get this in any time in the next few months we probably want to instead provide a default implementation that calls through to list - erroring if not provided

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for I didn't get "this is a breaking change" here, do you mean maybe some other crates import the object_store crate and implement the ObjectStore traits, they have to implement the list_opts method if they upgrade the object_store version, right? If it is, it might be problem, provide a default implementation is a better option, but it also means we cannot remove other list methods to make this trait more concise.

I planed marked other list methods as deprecated, it seems that it's better to do it in a big version.

src/prefix.rs Outdated
};
let s = self.inner.list_opts(Some(&prefix), opts);

s.map_ok(move |lst| ListResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could encapsulate this, it is identical to list_with_delimiter below

if !k.as_ref().starts_with(prefix.as_ref()) {
break;
}

if k <= offset {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead just start listing from offset instead of prefix if provided?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, maybe start listing from the bigger one between offset and prefix, because technically offset could be less than prefix.

src/memory.rs Outdated
// Only objects in this base level should be returned in the
// response. Otherwise, we just collect the common prefixes.
let mut objects = vec![];
for (k, v) in self.storage.read().map.range((prefix)..) {
if let Some(may_keys) = max_keys {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could lift the conditional out by doing let max_keys = max_keys.unwrap_or(usize::MAX).

src/lib.rs Outdated

while let Some(result) = stream.next().await {
let response = result?;
common_prefixes.extend(response.common_prefixes.into_iter());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this potentially result in the same common_prefix appearing multiple times?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could be, but the BTreeSet should deduplicate the common_prefixes.

src/memory.rs Outdated
Comment on lines 440 to 444
let objects = match max_keys {
Some(max_keys) if max_keys < values.len() => {
values.truncate(max_keys);
values
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead use Stream::take?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will try

@@ -33,6 +32,8 @@ pub(crate) trait ListClient: Send + Sync + 'static {
delimiter: bool,
token: Option<&str>,
offset: Option<&str>,
max_keys: Option<usize>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this could use ListOpts?

@stayrascal
Copy link
Author

So I have done a first pass of a review, but unfortunately have run out of time for today.

I wonder if we might be able to find someway to reduce the size of this diff, its really quite hard to review at the moment... Perhaps we could split adding list_opts and adding the max-keys into separate PRs?

I also think we should try to make this not a breaking change, as it is unlikely we're going to look to make a breaking release in the next 4 or so months.

Thanks a lot for got lots of useful suggestions from you, i will split the this diff and also avoid the breaking change, thanks again.

@stayrascal
Copy link
Author

Hi @tustvold, I tried to reduce the scope of this PR, could you please help to review it when you free, thanks a lot.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done another pass, but again run out of time I have to spend on this.

As it stands this PR is really quite hard to review, it makes lots of changes to lots of places without it being very easy to connect things together.

I'll see if I can't find some time to bash out a smaller more incremental version of this PR in the next couple of weeks, but unfortunately my time is very limited at the moment.

src/lib.rs Outdated
Comment on lines 734 to 780
// Add the default implementation via calling list_with_offset methods avoid bring
// a breaking change, the default implementation and other list methods will be removed
// in the future. All list request should call this method.
use std::collections::BTreeSet;

let prefix = prefix.cloned().unwrap_or_default();
let offset = options.offset.unwrap_or(prefix.clone());
self.list_with_offset(Some(&prefix), &offset)
.try_chunks(1000)
.map(move |batch| {
let batch = batch.map_err(|e| e.1)?;
if options.delimiter {
let mut common_prefixes = BTreeSet::new();
let mut objects = Vec::new();
for obj in batch.into_iter() {
let location = obj.location.clone();
let mut parts = match location.prefix_match(&prefix) {
Some(parts) => parts,
None => continue,
};

match parts.next() {
None => {}
Some(p) => match parts.next() {
None => objects.push(obj),
Some(_) => {
let common_prefix = prefix.child(p);
if common_prefix > offset {
common_prefixes.insert(common_prefix);
}
}
},
}
drop(parts)
}
Ok(ListResult {
common_prefixes: common_prefixes.into_iter().collect(),
objects,
})
} else {
Ok(ListResult {
common_prefixes: vec![],
objects: batch,
})
}
})
.boxed()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Add the default implementation via calling list_with_offset methods avoid bring
// a breaking change, the default implementation and other list methods will be removed
// in the future. All list request should call this method.
use std::collections::BTreeSet;
let prefix = prefix.cloned().unwrap_or_default();
let offset = options.offset.unwrap_or(prefix.clone());
self.list_with_offset(Some(&prefix), &offset)
.try_chunks(1000)
.map(move |batch| {
let batch = batch.map_err(|e| e.1)?;
if options.delimiter {
let mut common_prefixes = BTreeSet::new();
let mut objects = Vec::new();
for obj in batch.into_iter() {
let location = obj.location.clone();
let mut parts = match location.prefix_match(&prefix) {
Some(parts) => parts,
None => continue,
};
match parts.next() {
None => {}
Some(p) => match parts.next() {
None => objects.push(obj),
Some(_) => {
let common_prefix = prefix.child(p);
if common_prefix > offset {
common_prefixes.insert(common_prefix);
}
}
},
}
drop(parts)
}
Ok(ListResult {
common_prefixes: common_prefixes.into_iter().collect(),
objects,
})
} else {
Ok(ListResult {
common_prefixes: vec![],
objects: batch,
})
}
})
.boxed()
// Add the default implementation via calling list_with_offset methods avoid bring
// a breaking change, the default implementation will be removed, and default
// implementations added for the other list methods using it
futures::stream::once(futures::future::ready(Err(Error::NotImplemented))).boxed()

I don't think there is a coherent way to provide a default implementation here, the below ignores extensions and doesn't push down the delimiter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, throw NotImplemented error is a simple way,

src/lib.rs Outdated
@@ -1283,6 +1359,22 @@ pub struct PutResult {
pub version: Option<String>,
}

/// Options for [`ObjectStore::list_opts`]
#[derive(Debug, Clone, Default)]
pub struct ListOpts {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub struct ListOpts {
pub struct ListOptions {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, followed the pattern of PutMultipartOpts

src/lib.rs Outdated
Comment on lines 1373 to 1374
///
/// They are also eclused from [`PartialEq`] and [`Eq`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
///
/// They are also eclused from [`PartialEq`] and [`Eq`].

I appreciate this is copied, but it is also not true

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will remove the same comments

offset: &Path,
) -> BoxStream<'static, Result<ObjectMeta>>;

async fn list_with_delimiter(&self, prefix: Option<&Path>) -> Result<ListResult>;
}

#[async_trait]
impl<T: ListClient + Clone> ListClientExt for T {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a list_opts impl here, to avoid duplicating the plumbing logic in the 3 cloud provider stores

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean add a list_opts and then calling list_paginated within list_opts impl, and the 3 clould provider stores call self.client.list_opts(prefix, options)? like this:

fn list_opts(
    &self,
    prefix: Option<&Path>,
    opts: ListOpts,
) -> BoxStream<'static, Result<ListResult>> {
    let client = self.clone();
    client.list_paginated(prefix, opts)
}

The 3 clould provider are calling list_paginated method directly during implementing list_opts of ObjectStore, because the list_opts of ListClientExt will be very simple, just transfer request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I follow what you are saying, but if list_paginated ends up with the same signature as list_opts, it might as well just get renamed to list_opts.

@@ -157,6 +158,93 @@ impl ObjectStore for HttpStore {
.boxed()
}

fn list_opts(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation seems over complicated given that webdav doesn't support pagination.

Could we perhaps reuse more of the existing logic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i considered reuse the logic of list and list_with_delimiter, but since the response of list_opts is a BoxStream with static lifecycle, so cannot reuse list_with_delimiter directly(please correct me if misunderstand something).

@@ -960,6 +960,102 @@ pub async fn list_with_delimiter(storage: &DynObjectStore) {
assert!(content_list.is_empty());
}

/// Tests listing with composite conditions by [`ObjectStore::list_opts`]
pub async fn list_with_composite_conditions(storage: &DynObjectStore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to be wired into all of the integration tests?

src/lib.rs Outdated
/// Options for [`ObjectStore::list_opts`]
#[derive(Debug, Clone, Default)]
pub struct ListOpts {
/// The listed objects whose locations are greater than `offset`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The listed objects whose locations are greater than `offset`
/// Only list objects greater than `offset`

src/lib.rs Outdated
pub struct ListOpts {
/// The listed objects whose locations are greater than `offset`
pub offset: Option<Path>,
/// True means returns common prefixes (directories) in addition to object in [`ListResult`]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// True means returns common prefixes (directories) in addition to object in [`ListResult`]
/// Only list up to the next path delimiter `/` [`ListResult`]
///
/// See [`ObjectStore::list_with_delimiter`]

@stayrascal
Copy link
Author

I've done another pass, but again run out of time I have to spend on this.

As it stands this PR is really quite hard to review, it makes lots of changes to lots of places without it being very easy to connect things together.

I'll see if I can't find some time to bash out a smaller more incremental version of this PR in the next couple of weeks, but unfortunately my time is very limited at the moment.

I'm so sorry that this PR has taken up too much of your time. At the same time, I'm extremely grateful for your very constructive suggestions. I have already revised the code and made the entire PR simplify as much as possible. I hope this can reduce the difficulty of the review. Thanks again for your help.

@tustvold
Copy link
Contributor

#371 contains an alternative proposal to add support for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ObjectStore::list_opts
2 participants