Skip to content

Add Pyodide support and CI jobs for Zarr #1903

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 81 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
9f5a110
Add CI job to test out-of-tree Pyodide builds
agriyakhetarpal May 22, 2024
29282fc
Add `[msgpack]` dependency for `numcodecs`
agriyakhetarpal May 23, 2024
d465742
Bump to Pyodide 0.26.0, update comments
agriyakhetarpal May 27, 2024
cdf0bb2
Try to run tests without async
agriyakhetarpal May 27, 2024
dfe0321
Move shared file to rootdir, outside v2 and v3
agriyakhetarpal May 27, 2024
b100ec9
Move `fasteners` import inside ThreadSynchronizer
agriyakhetarpal May 27, 2024
b0dddca
Make the tests directory importable, fix `_shared`
agriyakhetarpal May 27, 2024
d227728
Import list of greetings from `numcodecs`
agriyakhetarpal May 27, 2024
fdb2bef
Skip some tests that use threading
agriyakhetarpal May 27, 2024
621077a
Skip some tests that use `fcntl`
agriyakhetarpal May 27, 2024
7ae9a97
Skip tests that require `dbm`
agriyakhetarpal May 27, 2024
22eb6da
Move `IS_WASM` logic to internal `zarr` API
agriyakhetarpal May 27, 2024
6836947
Skip a few tests trying to import `multiprocessing`
agriyakhetarpal May 27, 2024
fe3bf27
Skip tests that use async and threading code
agriyakhetarpal May 27, 2024
08997ec
Improve `asyncio_tests_wrapper`, fix test imports
agriyakhetarpal May 27, 2024
9bfc860
Skip entire `test_codecs.py` file
agriyakhetarpal May 27, 2024
9bcb350
Skip yet another test that requires threads
agriyakhetarpal May 27, 2024
9985abb
xfail test where array's fill values are different
agriyakhetarpal May 27, 2024
7ea12ef
xfail test because Emscripten FS
agriyakhetarpal May 27, 2024
a6565de
Skip last test that tries to run threads
agriyakhetarpal May 27, 2024
85f621c
Another test that tries to run threads
agriyakhetarpal May 27, 2024
1a64255
xfail another array's differing `fill_values` test
agriyakhetarpal May 27, 2024
c8cb38b
Skip entire sync file under WASM, no threading
agriyakhetarpal May 27, 2024
eb36d40
Restore pytest config options, remove when needed
agriyakhetarpal May 28, 2024
e3bf365
Merge main
agriyakhetarpal May 26, 2025
b3a5b8a
Bump Emscripten, Pyodide xbuildenv, Node.js versions
agriyakhetarpal May 26, 2025
42d2792
Running on `ubuntu-latest` should be fine
agriyakhetarpal May 26, 2025
27068e2
Don't persist credentials with git clone
agriyakhetarpal May 26, 2025
aff9b18
Don't pin the version of `pyodide-build`
agriyakhetarpal May 26, 2025
bb2c136
Use same xbuildenv for building and testing
agriyakhetarpal May 26, 2025
710195a
Temporarily build numcodecs for WASM as well
agriyakhetarpal May 26, 2025
07e5bc9
Use Pyodide 0.28.0a2 for now
agriyakhetarpal May 26, 2025
403fbb0
Use `fetch-depth: 0` to bring correct versions
agriyakhetarpal May 27, 2025
24dbc77
Skip `test_multiprocessing` for WASM
agriyakhetarpal May 27, 2025
ecea615
Skip all sync tests
agriyakhetarpal May 27, 2025
d919bd7
xfail `test_array_roundtrip` for now
agriyakhetarpal May 27, 2025
0bb7d47
Skip `test_group_members_performance[memory]` for now
agriyakhetarpal May 27, 2025
fb59eba
Set concurrency and max workers as 1
agriyakhetarpal May 27, 2025
4c6bed6
Update `zarr.config` tests to match
agriyakhetarpal May 27, 2025
6754131
Move WASM check to resolve circular import
agriyakhetarpal May 27, 2025
f426ed7
Mark Blosc `test_typesize` as a known failure case
agriyakhetarpal May 27, 2025
405d247
Skip `async.concurrency` config override test case
agriyakhetarpal May 27, 2025
e93073a
Mark some indexing tests as flaky on WASM
agriyakhetarpal May 27, 2025
d862953
Oops, fix a config test
agriyakhetarpal May 27, 2025
cbd1d4d
Fix another config test
agriyakhetarpal May 27, 2025
3e8bdef
Hook into Pyodide WebLoop
agriyakhetarpal May 27, 2025
4ee492e
Move `zarr.constants` to `zarr._constants`
agriyakhetarpal May 27, 2025
d94970e
Bump to Pyodide 0.28.0a3
agriyakhetarpal May 27, 2025
cd3424c
Fix typo
agriyakhetarpal May 27, 2025
5044a22
`asyncio_mode = "auto"` works now, clean it up
agriyakhetarpal May 28, 2025
e131867
Restore `test_group_members_performance`
agriyakhetarpal May 28, 2025
c7c22dc
Disable SIMD when building numcodecs
agriyakhetarpal May 28, 2025
e62c933
Oops, disable AVX2, SSE2 at the right place
agriyakhetarpal May 28, 2025
d3bcf56
Debug improper numcodecs version
agriyakhetarpal May 28, 2025
e4b7379
Debug numcodecs version again
agriyakhetarpal May 28, 2025
35cecc1
Fetch tags manually for now
agriyakhetarpal May 28, 2025
ceb70c7
Force Zarr to install
agriyakhetarpal May 28, 2025
1ea992f
Install `numcodecs` with `crc32c`
agriyakhetarpal May 28, 2025
f51ddd1
Install the rest of the missing dependencies
agriyakhetarpal May 28, 2025
0ec47b9
Escape wheel filename correctly
agriyakhetarpal May 28, 2025
e1617c0
Remove extra install line
agriyakhetarpal May 28, 2025
23e34bf
Fix misquoted end
agriyakhetarpal May 28, 2025
22e6795
Remove lenience for performance test, skip it instead
agriyakhetarpal May 29, 2025
da8bfc7
Undo async.concurrency to 1, improve performance
agriyakhetarpal May 29, 2025
81f5df3
Add `slow_wasm` marker, skip orthogonal indexing tests
agriyakhetarpal May 29, 2025
bad9920
Move WebLoop patch to `conftest.py`
agriyakhetarpal May 29, 2025
2217455
Mark more indexing tests as slow in WASM
agriyakhetarpal May 29, 2025
86f8785
Fix condition for slow WASM tests
agriyakhetarpal May 29, 2025
3230892
Clearer skip message for slow WASM tests
agriyakhetarpal May 29, 2025
07b5645
Merge main
agriyakhetarpal May 29, 2025
ef70cbd
Add release note for Pyodide/WASM support.
agriyakhetarpal May 29, 2025
9212c0e
Ignore WASM code paths Codecov doesn't know about
agriyakhetarpal May 29, 2025
86323eb
Bring back numcodecs version check
agriyakhetarpal May 29, 2025
9c6af32
Revert "Bring back numcodecs version check"
agriyakhetarpal May 29, 2025
090c62b
Add guidance highlighting JSPI requirement
agriyakhetarpal May 30, 2025
3f2d41d
Merge branch 'main' into emscripten
agriyakhetarpal May 30, 2025
b89f682
Drop `shutdown_asyncgens` fixture
agriyakhetarpal Jun 3, 2025
700aae4
Clarify docs for JSPI availability and usage
agriyakhetarpal Jun 3, 2025
93680d2
Merge branch 'main' into emscripten
agriyakhetarpal Jun 3, 2025
401311e
Fix lint error
agriyakhetarpal Jun 3, 2025
dd25a36
Update release note with JSPI info
agriyakhetarpal Jun 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions .github/workflows/emscripten.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Attributed to NumPy https://github.com/numpy/numpy/pull/25894
# https://github.com/numpy/numpy/blob/d2d2c25fa81b47810f5cbd85ea6485eb3a3ffec3/.github/workflows/emscripten.yml

name: Pyodide wheel

on:
# TODO: refine after this is ready to merge
[push, pull_request, workflow_dispatch]

env:
FORCE_COLOR: 3
PYODIDE_VERSION: 0.28.0a3
# PYTHON_VERSION and EMSCRIPTEN_VERSION are determined by PYODIDE_VERSION.
# The appropriate versions can be found in the Pyodide repodata.json
# "info" field, or in Makefile.envs:
# https://github.com/pyodide/pyodide/blob/main/Makefile.envs#L2
PYTHON_VERSION: 3.13 # any 3.13.x version works
EMSCRIPTEN_VERSION: 4.0.9
NODE_VERSION: 22

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true

permissions:
contents: read

jobs:
build_wasm_emscripten:
name: Build and test Zarr for Pyodide
runs-on: ubuntu-latest
# To enable this workflow on a fork, comment out:
# FIXME: uncomment after this is ready to merge
# if: github.repository == 'zarr-developers/zarr-python'
steps:
- name: Checkout Zarr repository
uses: actions/checkout@v4
with:
fetch-depth: 0
fetch-tags: true

- name: Set up Python ${{ env.PYTHON_VERSION }}
id: setup-python
uses: actions/setup-python@v5
with:
python-version: ${{ env.PYTHON_VERSION }}

- name: Set up Emscripten toolchain
uses: mymindstorm/setup-emsdk@v14
with:
version: ${{ env.EMSCRIPTEN_VERSION }}
actions-cache-folder: emsdk-cache

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}

- name: Install pyodide-build
run: python -m pip install pyodide-build

- name: Build Zarr for Pyodide
run: |
pyodide xbuildenv install ${{ env.PYODIDE_VERSION }}
pyodide build

### (Temporarily) build numcodecs as well, as we have an older version in the Pyodide distribution (v0.13.1)

- name: Clone numcodecs repository
uses: actions/checkout@v4
with:
# See https://github.com/zarr-developers/numcodecs/pull/529
repository: agriyakhetarpal/numcodecs
ref: setup-emscripten-ci
path: numcodecs-wasm
submodules: recursive
fetch-depth: 0
fetch-tags: true

# For some reason fetch-depth: 0 and fetch-tags: true aren't working...
- name: Manually fetch tags for numcodecs
working-directory: numcodecs-wasm
run: git fetch --tags
Comment on lines +80 to +83
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unexplained reasons, even this doesn't help.


- name: Build numcodecs for WASM
run: pyodide build
working-directory: numcodecs-wasm
env:
DISABLE_NUMCODECS_AVX2: 1
DISABLE_NUMCODECS_SSE2: 1

### Back to Zarr repository to run tests

- name: Run Zarr tests for Pyodide
run: |
# Set up Pyodide virtual environment and activate it
pyodide venv .venv-pyodide
source .venv-pyodide/bin/activate

# Install numcodecs
pip install $(ls numcodecs-wasm/dist/*.whl)"[crc32c]"

# Install Zarr without dependencies until we can figure out the
# numcodecs wheel versioning issue
pip install dist/*.whl --no-deps
pip install "packaging>=22.0" "numpy>=1.25" "typing_extensions>=4.9" "donfig>=0.8"

# Install test dependencies
pip install "coverage" "pytest" "pytest-asyncio" "pytest-cov" "pytest-accept" "rich" "mypy" "hypothesis"

python -m pytest tests -v --cov=zarr --cov-config=pyproject.toml

6 changes: 6 additions & 0 deletions changes/1903.feature.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Added official support for the Pyodide/WebAssembly platform for using Zarr within browser-based environments.
The ``threading.max_workers`` parameter takes a default value of 1, and the ``zarr.sync`` interface is not
supported. At the moment, using Zarr requires the JavaScript Promise Integration (JSPI) WebAssembly feature
to be enabled with Pyodide and is hidden behind flags in web browsers to enable experimental support. See the
`JavaScript Promise Integration reference <https://github.com/WebAssembly/js-promise-integration>`_ and
`WebAssembly feature status <https://webassembly.org/features/>`_ pages for more details.
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -410,13 +410,15 @@ filterwarnings = [
"ignore:The dtype .* is currently not part in the Zarr format 3 specification.*:UserWarning",
"ignore:Use zarr.create_array instead.:DeprecationWarning",
"ignore:Duplicate name.*:UserWarning",
"ignore:Error cleaning up asyncio loop.*:RuntimeWarning", # appears in Pyodide/WASM as it uses its own browser-based event loop
"ignore:The `compressor` argument is deprecated. Use `compressors` instead.:UserWarning",
"ignore:Numcodecs codecs are not in the Zarr version 3 specification and may not be supported by other zarr implementations.:UserWarning",
"ignore:Unclosed client session <aiohttp.client.ClientSession.*:ResourceWarning"
]
markers = [
"gpu: mark a test as requiring CuPy and GPU",
"slow_hypothesis: slow hypothesis tests",
"slow_wasm: slow tests in Pyodide/WASM",
]

[tool.repo-review]
Expand Down
9 changes: 9 additions & 0 deletions src/zarr/_constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# This file only exists to not incur circular import issues
# TODO: find a better location for this or keep it here

from __future__ import annotations

import platform
import sys

IS_WASM: bool = sys.platform == "emscripten" or platform.machine() in ["wasm32", "wasm64"]
14 changes: 6 additions & 8 deletions src/zarr/codecs/zstd.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,7 @@
from functools import cached_property
from typing import TYPE_CHECKING

import numcodecs
from numcodecs.zstd import Zstd
from packaging.version import Version

from zarr.abc.codec import BytesBytesCodec
from zarr.core.buffer.cpu import as_numpy_array_wrapper
Expand Down Expand Up @@ -44,12 +42,12 @@ class ZstdCodec(BytesBytesCodec):

def __init__(self, *, level: int = 0, checksum: bool = False) -> None:
# numcodecs 0.13.0 introduces the checksum attribute for the zstd codec
_numcodecs_version = Version(numcodecs.__version__)
if _numcodecs_version < Version("0.13.0"):
raise RuntimeError(
"numcodecs version >= 0.13.0 is required to use the zstd codec. "
f"Version {_numcodecs_version} is currently installed."
)
# _numcodecs_version = Version(numcodecs.__version__)
# if _numcodecs_version < Version("0.13.0"):
# raise RuntimeError(
# "numcodecs version >= 0.13.0 is required to use the zstd codec. "
# f"Version {_numcodecs_version} is currently installed."
# )
Comment on lines +45 to +50
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary change; these lines will be uncommented when we either have a solution for the numcodecs version being fetched in the CI job or we create yet another alpha with an updated version of numcodecs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about skipping tests that require newer version of numcodecs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, this is only to disable the version check. numcodecs is being compiled from zarr-developers/numcodecs#529 with the latest version, just that the version is incorrect.


level_parsed = parse_zstd_level(level)
checksum_parsed = parse_checksum(checksum)
Expand Down
4 changes: 3 additions & 1 deletion src/zarr/core/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@

from donfig import Config as DConfig

from zarr._constants import IS_WASM

if TYPE_CHECKING:
from donfig.config_obj import ConfigSet

Expand Down Expand Up @@ -107,7 +109,7 @@ def enable_gpu(self) -> ConfigSet:
},
},
"async": {"concurrency": 10, "timeout": None},
"threading": {"max_workers": None},
"threading": {"max_workers": 1 if IS_WASM else None},
"json_indent": 2,
"codec_pipeline": {
"path": "zarr.core.codec_pipeline.BatchedCodecPipeline",
Expand Down
49 changes: 49 additions & 0 deletions src/zarr/core/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,12 @@
import os
import threading
from concurrent.futures import ThreadPoolExecutor, wait
from textwrap import dedent
from typing import TYPE_CHECKING, Any, TypeVar

from typing_extensions import ParamSpec

from zarr._constants import IS_WASM
from zarr.core.config import config

if TYPE_CHECKING:
Expand Down Expand Up @@ -133,6 +135,46 @@ def sync(
--------
>>> sync(async_function(), existing_loop)
"""
# WASM environments (like Pyodide) cannot start new threads, so we need to handle
# coroutines differently. We integrate with the existing Pyodide WebLoop which
# schedules tasks on the browser's event loop using setTimeout():
# https://developer.mozilla.org/en-US/docs/Web/API/setTimeout
if IS_WASM: # pragma: no cover
# This code path is covered in the Pyodide/WASM CI job.
current_loop = asyncio.get_running_loop()
result = current_loop.run_until_complete(coro)
# Check if run_until_complete actually executed the coroutine or just returned a task
# In browsers without JSPI, run_until_complete is a no-op that will return the task/future.
if isinstance(result, (asyncio.Task, asyncio.Future)):
raise RuntimeError(
dedent("""
Cannot use synchronous zarr API in browser-based environments without JSPI.
Zarr requires JavaScript Promise Integration (JSPI) to work in browsers,
but JSPI is not enabled in your environment.

The available solutions are to either use Zarr's async API instead with
zarr.api.asynchronous, or if you want to use your existing code, follow
these steps (all required):
1. Enable JSPI in your Pyodide setup with
`loadPyodide({ enableRunUntilComplete: true })` AND
2. Use a JSPI-enabled website or browser configuration (for example, with
--enable-features=WebAssemblyExperimentalJSPI for Google Chrome). If you
are the owner of a website, you may sign up for an origin trial for JSPI.

If you are using Node.js, pass the --experimental-wasm-jspi flag
(available for v20+).

Note: JSPI is experimental and not yet standardised across all browsers.
See https://webassembly.org/features/ for more information and status,
https://v8.dev/blog/jspi#how-can-i-use-jspi-today%3F for usage, and
https://v8.dev/blog/jspi-ot for more information on origin trials.
""")
)
return result

# This code path is the original thread-based implementation
# for non-WASM environments; it creates a dedicated I/O thread
# with its own event loop.
if loop is None:
# NB: if the loop is not running *yet*, it is OK to submit work
# and we will wait for it
Expand Down Expand Up @@ -170,6 +212,13 @@ def _get_loop() -> asyncio.AbstractEventLoop:

The loop will be running on a separate thread.
"""
if IS_WASM: # pragma: no cover
# This case is covered in the Pyodide/WASM CI job.
raise RuntimeError(
"Thread-based event loop not available in WASM environment. "
"Use zarr.api.asynchronous or ensure sync() handles WASM case."
)

if loop[0] is None:
with _get_lock():
# repeat the check just in case the loop got filled between the
Expand Down
16 changes: 16 additions & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from hypothesis import HealthCheck, Verbosity, settings

from zarr import AsyncGroup, config
from zarr._constants import IS_WASM
from zarr.abc.store import Store
from zarr.codecs.sharding import ShardingCodec, ShardingCodecIndexLocation
from zarr.core.array import (
Expand Down Expand Up @@ -176,15 +177,30 @@ def pytest_addoption(parser: Any) -> None:
default=False,
help="run slow hypothesis tests",
)
parser.addoption(
"--run-slow-wasm",
action="store_true",
default=False,
help="run slow tests only applicable to WASM",
)


def pytest_collection_modifyitems(config: Any, items: Any) -> None:
if config.getoption("--run-slow-hypothesis"):
return
if config.getoption("--run-slow-wasm") and IS_WASM:
return

skip_slow_hyp = pytest.mark.skip(reason="need --run-slow-hypothesis option to run")
skip_slow_wasm = pytest.mark.skip(
reason="need --run-slow-wasm option to run in WASM, or not running in WASM"
)

for item in items:
if "slow_hypothesis" in item.keywords:
item.add_marker(skip_slow_hyp)
if "slow_wasm" in item.keywords and IS_WASM:
item.add_marker(skip_slow_wasm)


settings.register_profile(
Expand Down
5 changes: 5 additions & 0 deletions tests/test_array.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
import zarr.api.asynchronous
import zarr.api.synchronous as sync_api
from zarr import Array, AsyncArray, Group
from zarr._constants import IS_WASM
from zarr.abc.store import Store
from zarr.codecs import (
BytesCodec,
Expand Down Expand Up @@ -1677,6 +1678,10 @@ def _index_array(arr: Array, index: Any) -> Any:
return arr[index]


@pytest.mark.skipif(
IS_WASM,
reason="can't start new processes in Pyodide",
)
@pytest.mark.parametrize(
"method",
[
Expand Down
2 changes: 2 additions & 0 deletions tests/test_codecs/test_blosc.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
from packaging.version import Version

import zarr
from zarr._constants import IS_WASM
from zarr.abc.store import Store
from zarr.codecs import BloscCodec
from zarr.core.buffer import default_buffer_prototype
Expand Down Expand Up @@ -58,6 +59,7 @@ async def test_blosc_evolve(store: Store, dtype: str) -> None:
assert blosc_configuration_json["shuffle"] == "shuffle"


@pytest.mark.xfail(IS_WASM, reason="Blosc size mismatch, known failure case for Pyodide/WASM")
async def test_typesize() -> None:
a = np.arange(1000000, dtype=np.uint64)
codecs = [zarr.codecs.BytesCodec(), zarr.codecs.BloscCodec()]
Expand Down
3 changes: 2 additions & 1 deletion tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import zarr
import zarr.api
from zarr import zeros
from zarr._constants import IS_WASM
from zarr.abc.codec import CodecPipeline
from zarr.abc.store import ByteSetter, Store
from zarr.codecs import (
Expand Down Expand Up @@ -83,7 +84,7 @@ def test_config_defaults_set() -> None:
},
},
"async": {"concurrency": 10, "timeout": None},
"threading": {"max_workers": None},
"threading": {"max_workers": 1 if IS_WASM else None},
"json_indent": 2,
"codec_pipeline": {
"path": "zarr.core.codec_pipeline.BatchedCodecPipeline",
Expand Down
Loading
Loading