Skip to content

Use lazy wheel to obtain dep info for new resolver #8532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,18 +38,34 @@ jobs:
env:
- GROUP=1
- NEW_RESOLVER=1
- env:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are any of these tests exercising the new code? It will only impact the network tests, and there aren't many in our unit tests. Personally I'd trade all of these for 1 or 2 integration tests where we know the new code is being used. werkzeug (which we build on for our mock server test helper) has some built-in helpers for handling range requests so we could do all of this locally, without reaching out to PyPI. The PR in pallets/werkzeug#977 might provide some help in using it.

Copy link
Member

@chrahunt chrahunt Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add on to this, because of the code that we're touching, these are the specific integration tests I think would give us the most impact:

  1. pip wheel
  2. pip download
  3. pip install of a wheel that has extras that are also wheels

One approach would be to find existing integration tests, then convert them to use our mock server and then parameterize which mock server to use: plain one or range-request-supporting one.

- GROUP=1
- NEW_RESOLVER=1
- FAST_DEPS=1
- env:
- GROUP=2
- NEW_RESOLVER=1
- env:
- GROUP=2
- NEW_RESOLVER=1
- FAST_DEPS=1
- env:
- GROUP=3
- NEW_RESOLVER=1
- env:
- GROUP=3
- NEW_RESOLVER=1
- FAST_DEPS=1

fast_finish: true
allow_failures:
- env:
- GROUP=3
- NEW_RESOLVER=1
- env:
- GROUP=3
- NEW_RESOLVER=1
- FAST_DEPS=1

before_install: tools/travis/setup.sh
install: travis_retry tools/travis/install.sh
Expand Down
3 changes: 3 additions & 0 deletions news/8532.feature
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Allow the new resolver to obtain dependency information through wheels
lazily downloaded using HTTP range requests. To enable this feature,
invoke ``pip`` with ``--use-feature=fast-deps``.
2 changes: 1 addition & 1 deletion src/pip/_internal/cli/cmdoptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -916,7 +916,7 @@ def check_list_path_option(options):
metavar='feature',
action='append',
default=[],
choices=['2020-resolver'],
choices=['2020-resolver', 'fast-deps'],
help='Enable new functionality, that may be backward incompatible.',
) # type: Callable[..., Option]

Expand Down
1 change: 1 addition & 0 deletions src/pip/_internal/cli/req_command.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,7 @@ def make_resolver(
force_reinstall=force_reinstall,
upgrade_strategy=upgrade_strategy,
py_version_info=py_version_info,
lazy_wheel='fast-deps' in options.features_enabled,
Copy link
Member

@chrahunt chrahunt Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RequirementPreparer currently hides all details about the way that we're accessing distributions, so the only thing that the Candidate needs to be aware of is the abstract distribution after preparing. If we put the lazy wheel logic there instead (probably used to create the abstract distribution), then:

  1. it preserves that separation of concerns, with all of the associated benefits
  2. we get more code reuse, since the lazy-wheel-backed abstract distribution would follow the same path as the eager-wheel-backed one, with all of the associated benefits
  3. it gives us control over whether the wheel download is actually lazy - IIUC currently we're relying on the fact that no one happens to access _InstallRequirementBackedCandidate.dist prior to iter_dependencies()
  4. it gives us control over when the real wheel download happens - IIUC currently we're relying on the fact that someone happens to access _InstallRequirementBackedCandidate.dist before actually trying to install the wheel
  5. it works when we compose candidates, like in ExtrasCandidate, which currently directly accesses self.base.dist and bypasses the lazy wheel logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooo! I missed ExtrasCandidate when I discussed this w/ @McSinyx.

I think we can cover this by changing how self.dist is populated (instead of changing how iter_dependencies works), since what's really happening in _iter_dependencies -- we're creating a separate distribution object and using it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to what was done in #8448.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pradyunsg, I think we did discuss point (5) and what you came up with (1a28d08) will handle it just fine. So now we came to a consensus on how to cover the listed points above, which is different from what this PR is pushing, I'm thinking about filing another one (likely within today) to avoid intensive rebasing which I'm not exactly good at 😄

@chrahunt, thank you for the thoughtful heads up as well as the tips on testing.

)
import pip._internal.resolution.legacy.resolver
return pip._internal.resolution.legacy.resolver.Resolver(
Expand Down
14 changes: 10 additions & 4 deletions src/pip/_internal/network/lazy_wheel.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Lazy ZIP over HTTP"""

__all__ = ['dist_from_wheel_url']
__all__ = ['HTTPRangeRequestUnsupported', 'dist_from_wheel_url']

from bisect import bisect_left, bisect_right
from contextlib import contextmanager
Expand All @@ -23,13 +23,18 @@
from pip._internal.network.session import PipSession


class HTTPRangeRequestUnsupported(RuntimeError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class HTTPRangeRequestUnsupported(RuntimeError):
class HTTPRangeRequestUnsupported(Exception):

pass


def dist_from_wheel_url(name, url, session):
# type: (str, str, PipSession) -> Distribution
"""Return a pkg_resources.Distribution from the given wheel URL.

This uses HTTP range requests to only fetch the potion of the wheel
containing metadata, just enough for the object to be constructed.
If such requests are not supported, RuntimeError is raised.
If such requests are not supported, HTTPRangeRequestUnsupported
is raised.
"""
with LazyZipOverHTTP(url, session) as wheel:
# For read-only ZIP files, ZipFile only needs methods read,
Expand All @@ -45,7 +50,8 @@ class LazyZipOverHTTP(object):

This uses HTTP range requests to lazily fetch the file's content,
which is supposed to be fed to ZipFile. If such requests are not
supported by the server, raise RuntimeError during initialization.
supported by the server, raise HTTPRangeRequestUnsupported
during initialization.
"""

def __init__(self, url, session, chunk_size=CONTENT_CHUNK_SIZE):
Expand All @@ -60,7 +66,7 @@ def __init__(self, url, session, chunk_size=CONTENT_CHUNK_SIZE):
self._left = [] # type: List[int]
self._right = [] # type: List[int]
if 'bytes' not in head.headers.get('Accept-Ranges', 'none'):
raise RuntimeError('range request is not supported')
raise HTTPRangeRequestUnsupported('range request is not supported')
self._check_zip()

@property
Expand Down
78 changes: 57 additions & 21 deletions src/pip/_internal/resolution/resolvelib/candidates.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,16 @@
from pip._vendor.packaging.version import Version

from pip._internal.exceptions import HashError, MetadataInconsistent
from pip._internal.network.lazy_wheel import (
HTTPRangeRequestUnsupported,
dist_from_wheel_url,
)
from pip._internal.req.constructors import (
install_req_from_editable,
install_req_from_line,
)
from pip._internal.req.req_install import InstallRequirement
from pip._internal.utils.logging import indent_log
from pip._internal.utils.misc import dist_is_editable, normalize_version_info
from pip._internal.utils.packaging import get_requires_python
from pip._internal.utils.typing import MYPY_CHECK_RUNNING
Expand Down Expand Up @@ -197,6 +202,18 @@ def _prepare_abstract_distribution(self):
# type: () -> AbstractDistribution
raise NotImplementedError("Override in subclass")

def _check_metadata_consistency(self, dist):
# type: (Distribution) -> None
"""Check for consistency of project name and version of dist."""
# TODO: (Longer term) Rather than abort, reject this candidate
# and backtrack. This would need resolvelib support.
name = canonicalize_name(dist.project_name)
if self._name is not None and self._name != name:
raise MetadataInconsistent(self._ireq, "name", dist.project_name)
version = dist.parsed_version
if self._version is not None and self._version != version:
raise MetadataInconsistent(self._ireq, "version", dist.version)

def _prepare(self):
# type: () -> None
if self._dist is not None:
Expand All @@ -210,50 +227,43 @@ def _prepare(self):

self._dist = abstract_dist.get_pkg_resources_distribution()
assert self._dist is not None, "Distribution already installed"

# TODO: (Longer term) Rather than abort, reject this candidate
# and backtrack. This would need resolvelib support.
name = canonicalize_name(self._dist.project_name)
if self._name is not None and self._name != name:
raise MetadataInconsistent(
self._ireq, "name", self._dist.project_name,
)
version = self._dist.parsed_version
if self._version is not None and self._version != version:
raise MetadataInconsistent(
self._ireq, "version", self._dist.version,
)
self._check_metadata_consistency(self._dist)

@property
def dist(self):
# type: () -> Distribution
self._prepare()
return self._dist

def _get_requires_python_specifier(self):
# type: () -> Optional[SpecifierSet]
requires_python = get_requires_python(self.dist)
def _get_requires_python_specifier(self, dist):
# type: (Distribution) -> Optional[SpecifierSet]
requires_python = get_requires_python(dist)
if requires_python is None:
return None
try:
spec = SpecifierSet(requires_python)
except InvalidSpecifier as e:
name = canonicalize_name(dist.project_name)
logger.warning(
"Package %r has an invalid Requires-Python: %s", self.name, e,
"Package %r has an invalid Requires-Python: %s", name, e,
)
return None
return spec

def iter_dependencies(self):
# type: () -> Iterable[Optional[Requirement]]
for r in self.dist.requires():
def _iter_dependencies(self, dist):
# type: (Distribution) -> Iterable[Optional[Requirement]]
for r in dist.requires():
yield self._factory.make_requirement_from_spec(str(r), self._ireq)
python_dep = self._factory.make_requires_python_requirement(
self._get_requires_python_specifier(),
self._get_requires_python_specifier(dist),
)
if python_dep:
yield python_dep

def iter_dependencies(self):
# type: () -> Iterable[Optional[Requirement]]
return self._iter_dependencies(self.dist)

def get_install_requirement(self):
# type: () -> Optional[InstallRequirement]
self._prepare()
Expand Down Expand Up @@ -293,6 +303,32 @@ def __init__(
version=version,
)

def iter_dependencies(self):
# type: () -> Iterable[Optional[Requirement]]
dist = None # type: Optional[Distribution]
preparer, lazy_wheel = self._factory.preparer, self._factory.lazy_wheel
remote_wheel = self._link.is_wheel and not self._link.is_file
if lazy_wheel and remote_wheel and not preparer.require_hashes:
assert self._name is not None
logger.info('Collecting %s', self._ireq.req or self._ireq)
with indent_log():
logger.info(
'Obtaining dependency information from %s %s',
self._name, self._version,
)
url = self._link.url.split('#', 1)[0]
session = preparer.downloader._session
try:
dist = dist_from_wheel_url(self._name, url, session)
except HTTPRangeRequestUnsupported:
logger.debug(
'Failed to get dependency information '
'using HTTP range requests from %s', url,
)
else:
self._check_metadata_consistency(dist)
return self._iter_dependencies(dist or self.dist)

def _prepare_abstract_distribution(self):
# type: () -> AbstractDistribution
return self._factory.preparer.prepare_linked_requirement(
Expand Down
2 changes: 2 additions & 0 deletions src/pip/_internal/resolution/resolvelib/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ def __init__(
ignore_installed, # type: bool
ignore_requires_python, # type: bool
py_version_info=None, # type: Optional[Tuple[int, ...]]
lazy_wheel=False, # type: bool
):
# type: (...) -> None
self._finder = finder
Expand All @@ -92,6 +93,7 @@ def __init__(
self._use_user_site = use_user_site
self._force_reinstall = force_reinstall
self._ignore_requires_python = ignore_requires_python
self.lazy_wheel = lazy_wheel

self._link_candidate_cache = {} # type: Cache[LinkCandidate]
self._editable_candidate_cache = {} # type: Cache[EditableCandidate]
Expand Down
2 changes: 2 additions & 0 deletions src/pip/_internal/resolution/resolvelib/resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ def __init__(
force_reinstall, # type: bool
upgrade_strategy, # type: str
py_version_info=None, # type: Optional[Tuple[int, ...]]
lazy_wheel=False, # type: bool
):
super(Resolver, self).__init__()

Expand All @@ -64,6 +65,7 @@ def __init__(
ignore_installed=ignore_installed,
ignore_requires_python=ignore_requires_python,
py_version_info=py_version_info,
lazy_wheel=lazy_wheel,
)
self.ignore_dependencies = ignore_dependencies
self.upgrade_strategy = upgrade_strategy
Expand Down
16 changes: 13 additions & 3 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ def pytest_addoption(parser):
default=False,
help="run the skipped tests for the new resolver",
)
parser.addoption(
"--fast-deps",
action="store_true",
default=False,
help="use lazy wheels in tests (only affect new resolver)",
)
parser.addoption(
"--use-venv",
action="store_true",
Expand Down Expand Up @@ -102,12 +108,16 @@ def pytest_collection_modifyitems(config, items):
@pytest.fixture(scope="session", autouse=True)
def use_new_resolver(request):
"""Set environment variable to make pip default to the new resolver.

Lazy wheel, an optimization, is also decided here.
"""
new_resolver = request.config.getoption("--new-resolver")
if new_resolver:
os.environ["PIP_USE_FEATURE"] = "2020-resolver"
else:
if not new_resolver:
os.environ.pop("PIP_USE_FEATURE", None)
elif request.config.getoption("--fast-deps"):
os.environ["PIP_USE_FEATURE"] = "2020-resolver fast-deps"
else:
os.environ["PIP_USE_FEATURE"] = "2020-resolver"
yield new_resolver


Expand Down
7 changes: 5 additions & 2 deletions tests/unit/test_network_lazy_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@
from pip._vendor.pkg_resources import Requirement
from pytest import fixture, mark, raises

from pip._internal.network.lazy_wheel import dist_from_wheel_url
from pip._internal.network.lazy_wheel import (
HTTPRangeRequestUnsupported,
dist_from_wheel_url,
)
from pip._internal.network.session import PipSession
from tests.lib.requests_mocks import MockResponse

Expand Down Expand Up @@ -39,7 +42,7 @@ def test_dist_from_wheel_url(session):
def test_dist_from_wheel_url_no_range(session, monkeypatch):
"""Test handling when HTTP range requests are not supported."""
monkeypatch.setattr(session, 'head', lambda *a, **kw: MockResponse(b''))
with raises(RuntimeError):
with raises(HTTPRangeRequestUnsupported):
dist_from_wheel_url('mypy', MYPY_0_782_WHL, session)


Expand Down
12 changes: 9 additions & 3 deletions tools/travis/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,28 @@ else
RESOLVER_SWITCH='--new-resolver'
fi

if [[ -z "$FAST_DEPS" ]]; then
FAST_DEPS_SWITCH=''
else
FAST_DEPS_SWITCH='--fast-deps'
fi

# Print the commands run for this test.
set -x
if [[ "$GROUP" == "1" ]]; then
# Unit tests
tox -- --use-venv -m unit -n auto
# Integration tests (not the ones for 'pip install')
tox -- -m integration -n auto --duration=5 -k "not test_install" \
--use-venv $RESOLVER_SWITCH
--use-venv $RESOLVER_SWITCH $FAST_DEPS_SWITCH
elif [[ "$GROUP" == "2" ]]; then
# Separate Job for running integration tests for 'pip install'
tox -- -m integration -n auto --duration=5 -k "test_install" \
--use-venv $RESOLVER_SWITCH
--use-venv $RESOLVER_SWITCH $FAST_DEPS_SWITCH
elif [[ "$GROUP" == "3" ]]; then
# Separate Job for tests that fail with the new resolver
tox -- -m fails_on_new_resolver -n auto --duration=5 \
--use-venv $RESOLVER_SWITCH --new-resolver-runtests
--use-venv $RESOLVER_SWITCH --new-resolver-runtests $FAST_DEPS_SWITCH
else
# Non-Testing Jobs should run once
tox
Expand Down