Skip to content

Support async FSMap objects in zarr.open #2774

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Jun 16, 2025

Conversation

maxrjones
Copy link
Member

@maxrjones maxrjones commented Jan 28, 2025

Addresses #2706

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 28, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Jan 28, 2025

I recall there being issues with fsmap before, but I confess I don't really know what an fsmap is -- can someone explain what an fsmap is, how it differs from an fsspec filesystem, and why people would use one over the other?

cc @martindurant

I have a vague feeling that it could be useful to have a Store class that wraps generic MutableMapping instances, and maybe fsmaps could go there, but that requires me knowing more about the user context for fsmap.

@martindurant
Copy link
Member

FSMap was created specifically for the needs of Zarr, and it could have been essentially the same as the v2 FSStore, but was much quicker to get out and working within dask/fsspec.

FSMap is a dict-compatible interface (mutable-mapping) on top of a FS instance, which zarr worked with since forever and ignores some FS functionality like the file-like API.

To make it work with v3 might be complex, because zarr makes async calls, and the FSMap interface is blocking, even if the underlying FS is async. That means that there will be sync() within sync(), which might still work as zarr maintains its own event loop in a thread separate from fsspec's.

@martindurant
Copy link
Member

To be clear: this PR does not use the mapper, but constructs a normal store from the mapper's attributes. I support this path.

@maxrjones
Copy link
Member Author

thanks for the questions and answers, Davis and Martin!

IIUC there are three cases that we'd need to account for:

  1. FSMap wraps an async instance of an async-compatible filesystem
    Solution - as implemented in this PR, extract the wrapped filesystem and use it to open an FsspecStore
  2. FSMap wraps a non-async instance of a non-async-compatible filesystem
    Solution option 1 - extract the wrapped filesystem and wrap in AsyncFileSystemWrapper to open an FsspecStore as in Wrap sync fs for xarray.to_zarr #2533
    Solution option 2 - if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore, for other protocols, wrap in AsyncFileSystemWrapper to open an FsspecStore
  3. FSMap wraps a non-async instance of an async-compatibile filesystem
    Solution - this is the case I'm not sure about. Is there a way to convert from a sync instance to an async instance without needing to wrap it in AsyncFileSystemWrapper? @martindurant could you please offer guidance here?

@martindurant
Copy link
Member

Is there a way to convert from a sync instance to an async instance without needing to wrap it in AsyncFileSystemWrapper

The instance has all the arguments it was made with as attributes, so you can make a new instance with asynchronous=True from those.

if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore

Is there a reason to bother doing this?

@maxrjones
Copy link
Member Author

maxrjones commented Jan 28, 2025

if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore

Is there a reason to bother doing this?

I was thinking based on https://filesystem-spec.readthedocs.io/en/latest/async.html#limitations that LocalStore may be faster since it was designed around providing an async interface.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 28, 2025

IMO it's simpler if the path is always fsmap -> fsspecstore

@martindurant
Copy link
Member

LocalStore may be faster since it was designed around providing an async interface.

I doubt it. The disk is not really async (at least with the standard syscalls python uses), so none of the async code should make any difference for local reads at all.

@maxrjones
Copy link
Member Author

maxrjones commented Feb 7, 2025

I ran into #2808 when trying to wrap sync filesystems. My preference for this PR would be to bring the codecov up to 100% and request that this be considered for review, with the plan that the conversion of sync instances to async and the wrapping of sync filesystems could be handled separately. My reasoning is that some people could use this current implementation and I'm not sure how quickly I'll be able to find time to add the other missing features.

Update: I added conversion of sync instances to async

@maxrjones maxrjones changed the title WIP: Support fsspec mutable mapping objects in zarr.open Support async FSMap objects in zarr.open Feb 13, 2025
@maxrjones maxrjones marked this pull request as ready for review February 13, 2025 01:44
Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks max, this looks great!

@maxrjones
Copy link
Member Author

@d-v-b what was your fix the last time you ran into the upstream fsspec <-> s3fs incompatibility issue?

@d-v-b
Copy link
Contributor

d-v-b commented Jun 3, 2025

see #3091, in that pr @dstansby speculated that one of the git commits in fsspec was tagged, which prevented the version from being indicated as a dev version. Maybe we should re-open that PR...

@d-v-b
Copy link
Contributor

d-v-b commented Jun 3, 2025

note that the problem went away without me doing anything, presumably due to some activity over in fsspec?

@maxrjones
Copy link
Member Author

note that the problem went away without me doing anything, presumably due to some activity over in fsspec?

Interesting, I don't even see the commit hash from the attempted install in the fsspec main branch's log. I also don't have permissions to rerun the failing job, so I'll just leave this PR as complete despite the failing test.

@maxrjones
Copy link
Member Author

@martindurant thanks for your earlier review of this PR. I just wanted to check if you want to take another look before it get merged?

Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor questions about imports and re-assignment of object attributes, but otherwise this looks good

@maxrjones maxrjones requested a review from d-v-b June 9, 2025 21:41
@aufdenkampe
Copy link

aufdenkampe commented Jun 12, 2025

@maxrjones & @d-v-b, thanks for all your efforts on this fix!

Our team working with USGS is watching progress closely and look forward updating the software stack to Zarr3 once this (and other blockers such as #2964 ) are merged!

@rabernat
Copy link
Contributor

@aufdenkampe - can you explain how you are using FSMap today? There may be much simpler workarounds than the solution in this PR. For example, are you loading data from object storage? Do you know the bucket and path name of the store you are trying to open?

@aufdenkampe
Copy link

@rabernat, we encountered this issue in our work supporting pygeoapi deployments for USGS. Specifically, we wanted to update our deployment to use zarr-python v3 but ran into this issue: https://code.usgs.gov/wma/nhgf/pygeoapi/-/issues/56

We recognized that we could apply a fix to pygeoapi directly (#2774 (comment)), using suggestions from:

... but the process of getting a PR through the pygeoapi dev review and release takes time, so we were hoping this fix would get pygeoapi working out of the box.

@d-v-b d-v-b merged commit 11d488d into zarr-developers:main Jun 16, 2025
30 checks passed
@maxrjones maxrjones deleted the support-FSMap branch June 16, 2025 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants