Skip to content

Ability to have the pip --extra-index-urls behaviour #223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lesteve opened this issue Mar 25, 2025 · 18 comments · May be fixed by #224
Open

Ability to have the pip --extra-index-urls behaviour #223

lesteve opened this issue Mar 25, 2025 · 18 comments · May be fixed by #224

Comments

@lesteve
Copy link

lesteve commented Mar 25, 2025

To reproduce

Following pyodide/pyodide#4898 (comment), I wanted to try out the CORS headers in anaconda.org so I have built a scikit-learn Pyodide wheel locally and uploaded it to anaconda.org. I was hoping to use the anaconda.org PyPI index https://pypi.anaconda.org/lesteve/simple like this:

import micropip
await micropip.install("scikit-learn", index_urls="https://pypi.anaconda.org/lesteve/simple", pre=True)

What I would have expected

scikit-learn dev wheel gets installed from my own index URL, but the dependencies (numpy, scipy, joblib, threadpoolctl) are installed from the lock-file

What happens instead

Looking at the browser console, it looks like it is trying to find dependencies in https://pypi.anaconda.org/lesteve/simple but this index only has scikit-learn so this fails.

Python traceback:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/lib/python3.12/site-packages/micropip/package_manager.py", line 133, in install
    return await install(
           ^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/micropip/install.py", line 53, in install
    await transaction.gather_requirements(requirements)
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 55, in gather_requirements
    await asyncio.gather(*requirement_promises)
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 62, in add_requirement
    return await self.add_requirement_inner(Requirement(req))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 155, in add_requirement_inner
    await self._add_requirement_from_package_index(req)
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 214, in _add_requirement_from_package_index
    await self.add_wheel(wheel, req.extras, specifier=str(req.specifier))
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 271, in add_wheel
    await self.gather_requirements(wheel.requires(extras))
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 55, in gather_requirements
    await asyncio.gather(*requirement_promises)
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 59, in add_requirement
    return await self.add_requirement_inner(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 155, in add_requirement_inner
    await self._add_requirement_from_package_index(req)
  File "/lib/python3.12/site-packages/micropip/transaction.py", line 196, in _add_requirement_from_package_index
    metadata = await package_index.query_package(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/micropip/package_index.py", line 308, in query_package
    metadata, headers = await fetch_string_and_headers(url, _fetch_kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python3.12/site-packages/micropip/_compat/_compat_in_pyodide.py", line 63, in fetch_string_and_headers
    response = await pyfetch(url, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lib/python312.zip/pyodide/http.py", line 449, in pyfetch
    raise AbortError(e) from None
pyodide.http.AbortError: NetworkError when attempting to fetch resource.

Part of the browser console that shows that micropip is trying to find dependencies in https://pypi.anaconda.org/lesteve/simple:

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://pypi.anaconda.org/lesteve/simple/threadpoolctl/. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 404.

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://pypi.anaconda.org/lesteve/simple/joblib/. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 404.

Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at https://pypi.anaconda.org/lesteve/simple/scipy/. (Reason: CORS header ‘Access-Control-Allow-Origin’ missing). Status code: 404.

More context

A simple work-around is to first install the dependencies without specifying a index_urls so that uses the lock-file I guess or the PyPI index for pure-Python wheels:

import micropip
await micropip.install(["joblib", "threadpoolctl", "scipy"])
await micropip.install("scikit-learn", index_urls="https://pypi.anaconda.org/lesteve/simple", pre=True)
@agriyakhetarpal
Copy link
Member

Hi @lesteve! I would rather say that this is the expected behaviour, as the index_urls argument means that we look at only this index – similar to how pip does it. If you were to install scikit-learn from SPNW, it should work as all dependencies are present on that index. Or, you can specify multiple indices to index_urls as a list of strings.

I think we could consider this a feature request to add support for the --extra-index-urls flag. I've encountered the same requirement in jupyterlite/pyodide-kernel#166. I'm also curious what @ryanking13 would say about it.

@hoodmane
Copy link
Member

I agree that it would be great to support an extra_index_urls argument.

@agriyakhetarpal
Copy link
Member

Or, if we were to make index_urls work like extra_index_urls, we should document it properly – it would be a breaking change.

@hoodmane
Copy link
Member

I don't think we'd want to diverge from pip here.

@lesteve
Copy link
Author

lesteve commented Mar 25, 2025

Thanks for your answers!

Indeed I was kind of expecting the --extra-index-urls behaviour to be able to conveniently install my own wheel from my own index and everything else as before. I have renamed the issue title to be about --extra-index-urls behaviour.

@lesteve lesteve changed the title My attempt at using anaconda.org PyPI index with micropip Ability to have the pip --extra-index-urls behaviour Mar 25, 2025
@ryanking13
Copy link
Member

I would rather say that this is the expected behaviour,

Hi @lesteve, I don't think it is expected behavior. We actually fallback to the lockfile if the package is not found in the index URL.

else:
try:
await self._add_requirement_from_package_index(req)
except ValueError:
logger.debug(
"Transaction: package %r not found in index, will search lock file",
req,
)
# If the requirement is not found in package index,
# we still have a chance to find it from pyodide lockfile.
if not await self._add_requirement_from_pyodide_lock(req):
logger.debug(
"Transaction: package %r not found in lock file", req
)
raise

The problem is that it seems like the Anaconda package index does not set CORS headers when they get a 404 Not found error. When CORS error is raised, it is not converted as a ValueError in Python, hence we don't fall back to the lockfile.

So... I guess we need to ask Anaconda folks again if it would be possible to set CORS headers in 404 (or other 4XX) errors as well. We have the same issue in PyPI so I've fixed it before (pypi/warehouse#16339).

@ryanking13
Copy link
Member

Hi, @fpliger. Sorry to ping you again. Would it be possible to set the CORS headers in Anaconda package index for 4XX errors as well?

@ryanking13
Copy link
Member

Meanwhile, I think we can relax the exception condition to catch the CORS error. It is converted to pyodide.http.AbortError.

@hoodmane
Copy link
Member

@ryanking13 Do we actually need CORS on the 404? We can see in this case that the response was a 404, so they don't have the wheel? Can we tell the difference between:

  • the response failed because of CORS and
  • the response was going to be a 404 but we're also missing CORS?

@ryanking13
Copy link
Member

ryanking13 commented Mar 26, 2025

@hoodmane Maybe, yes. We've changed that behavior in #129, but I don't exactly remember why I've agreed to raise error when the error is not 404 (including CORS error) (i.e. distinguishing ValueError and OSError) . @Carreau, do you remember the rationale behind it?

@agriyakhetarpal
Copy link
Member

Ah, just noticed the conversation here. I've just put up a PR to add extra_index_urls, with an index_strategy parameter to avoid dependency confusion attacks inspired by uv (as I think that should be fine to support even if it deviates from pip's behaviour). Should I close the PR?

@Carreau
Copy link
Contributor

Carreau commented Mar 26, 2025

Yes, all the errors were valueError everywhere, so basically if you had an incorrect URL, a captcha, a 500, a cors issue, invalid HTML or anything else micropip would just say "well, I guess this URL does not have a wheel, I'll check the next repo", so it was basically swallowing a bunch of legitimate errors.

and note that it's index_urls, not index_url (I think accepting str|list[str] is a mistake, and it should error, but not the point of this discussion), you can use PYPI as a placeholder for PYPI url, and whether to search it first or last I think:

>>> import micropip
await micropip.install("scikit-learn", index_urls=("https://pypi.anaconda.org/lesteve/simple", "PYPI"), pre=True, verbose=True) 

@Carreau
Copy link
Contributor

Carreau commented Mar 26, 2025

IN adition I think a special value ('LOCK') should be added to look into the lock file, it would avoid the boolean to look into lock for or not, and would allow customization or whether or not too look into lock and in which order) without extra parameters like extra_url.

@ryanking13
Copy link
Member

ryanking13 commented Mar 26, 2025

Ah, just noticed the conversation here. I've just put up a PR to add extra_index_urls, with an index_strategy parameter to avoid dependency confusion attacks inspired by uv (as I think that should be fine to support even if it deviates from pip's behaviour). Should I close the PR?

@agriyakhetarpal Yeah, thanks for the PR but I think we can be more careful about adding extra_index_urls. Also, we are accepting multiple index URLs because of some special situation of Pyodide (not all wheels will be in a single index URL), but IIRC PyPA folks do not like using multiple index URLs. It was because of the randomness in wheel resolution when multiple index URLs are provided.

@ryanking13
Copy link
Member

ryanking13 commented Mar 26, 2025

Yes, all the errors were valueError everywhere, so basically if you had an incorrect URL, a captcha, a 500, a cors issue, invalid HTML or anything else micropip would just say "well, I guess this URL does not have a wheel, I'll check the next repo", so it was basically swallowing a bunch of legitimate errors.

Okay, then let's update the code to consider CORS error just like other errors. @agriyakhetarpal Would you like to work on that?

@agriyakhetarpal
Copy link
Member

Ah, just noticed the conversation here. I've just put up a PR to add extra_index_urls, with an index_strategy parameter to avoid dependency confusion attacks inspired by uv (as I think that should be fine to support even if it deviates from pip's behaviour). Should I close the PR?

@agriyakhetarpal Yeah, thanks for the PR but I think we can be more careful about adding extra_index_urls. Also, we are accepting multiple index URLs because of some special situation of Pyodide (not all wheels will be in a single index URL), but IIRC PyPA folks do not like using multiple index URLs. It was because of the randomness in wheel resolution when multiple index URLs are provided.

That makes sense. I get that there are reservations to using multiple index URLs, given that pypa/pip#8606 is still open (and valid). I'm trying to draw upon uv's solution: https://docs.astral.sh/uv/configuration/indexes/#searching-across-multiple-indexes. I'll keep that PR as a draft until we have a consensus on the approach we want to take here.

In general, we should make the jupyterlite-pyodide-kernel customisable so that one can set micropip.set_index_urls() with a custom URL, PyPI, the lockfile, etc. when instantiating Pyodide, which I think should resolve the issue.

@agriyakhetarpal
Copy link
Member

Yes, all the errors were valueError everywhere, so basically if you had an incorrect URL, a captcha, a 500, a cors issue, invalid HTML or anything else micropip would just say "well, I guess this URL does not have a wheel, I'll check the next repo", so it was basically swallowing a bunch of legitimate errors.

Okay, then let's update the code to consider CORS error just like other errors. @agriyakhetarpal Would you like to work on that?

Yes, sure, @ryanking13. Do you have any pointers? I haven't followed the previous discussions on this area. :)

@ryanking13
Copy link
Member

I think we can fix the code here:

except HttpStatusError as e:
if e.status_code == 404:
logger.debug("NotFound (404) for %r, trying next index.", url)
continue
logger.debug(
"Error fetching %r (%s), trying next index.", url, e.status_code
)
raise

Instead of checking the HttpStatusError only, we can catch any Exception, and continue to the next index URL without raising the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants