-
Notifications
You must be signed in to change notification settings - Fork 3.1k
CI: Split PyPy tests to speed up end-to-end running time #5436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is there a better way to choose the 50 slowest tests? Why 50? I'm just concerned that this would go out of date over time and we'll have slow tests that don't get marked as such.
👍
👍 |
The main aim is split the functional tests up in some way, any way, to make the jobs more granular to make full use of parallel builds, so no runners are sitting idle for long periods of time. I added 5 (~5min), 25 (~7m), then 50 (~13min) to try and get the pypy3 and pypy3-slow jobs to take about the same time. Then added the network tests (~21min) which brought them about equal.
A valid concern. In fact, it doesn't really matter if some tests are the slowest or not, we just want to split them into two groups. We could instead split them arbitrarily by test filename. Perhaps do all test_install*.py in one (15 files) and the rest (20 files) in another. This would probably help with the CPython jobs too. The unit tests only take about 30 seconds, but there may be some extra benefit putting them in their own jobs too. |
Overall, I'm in favour of improving the CI runtimes, and the PyPy3 times are particularly annoying, because as you say that's the one that blocks everything. If I read your results right, you're getting around a 5-7 minute improvement (15-20%) which is pretty good. But I agree with @pradyunsg that the process is pretty arbitrary as it stands. Ideally, I'd prefer that we look a bit more closely at the possibilities here. Some thoughts:
Having said all of that, I'd rather we got some improvement from just accepting this PR as it stands, than doing nothing in the hope that someone would be interested in doing a more extensive job - after all it's not like we can't make further improvements later. |
Yes, it's being used by the
This would be useful to find out for #4497, I think it's out of scope of this one. I have noticed that PyPy tests tends to be slower on other projects too, for example https://travis-ci.org/python-pillow/Pillow/.
Possibly. Unit tests take about 30 seconds on Travis. Splitting them out might get some small wins, but the real slowness is in the functional tests. But I'll give it a go later.
Generally, if the overhead per worker isn't too high, then smaller and more granular runs are better because all parallel workers are used for longer. We don't want the situation where 4 workers have finished, and we're waiting for 1 long one to finish. I've just tried something like this in another branch, by putting the test_install*.py functional tests into their own worker (aka build job), and get similar results to this PR: 27 min 4 sec. https://travis-ci.org/hugovk/pip/builds/383237894
Can that be checked from BigQuery? |
I doubt. There's a certain amount of time it takes to actually getting to the point of running tests. I think it's fine to leave them in as-is. |
Yes. For the last month, comparing aggregated counts:
|
Oh, and I can't go on BigQuery and not post a huge table on GitHub immediately after. :P implementation.(name, version) -> downloads as on 24th May 23:50 IST. SELECT
details.implementation.name, details.implementation.version, COUNT(*) as download_count
FROM
TABLE_DATE_RANGE(
[the-psf:pypi.downloads],
DATE_ADD(CURRENT_TIMESTAMP(), -1, "month"),
CURRENT_TIMESTAMP()
)
GROUP BY
details.implementation.name, details.implementation.version
ORDER BY
download_count DESC, details.implementation.name ASC, details.implementation.version DESC
LIMIT
100 Gives: A table with first 100 entries
|
Let's do this instead then. :) |
Can we just have the PyPy tests only run on cron jobs on master (i.e. when |
Decent numbers for PyPy, coming up to a million, but of course a long way from CPython's ~500M (and ~100M from whatever's in null). FT = functional test
Yes. I'd like to suggest that is done in another PR to keep the changes separate, and easier to compare/review. |
It makes a huge difference. With parallel turned off, the build was terminated after 50 minutes for taking too long! |
"27 min 4 sec" vs "26 min 22 sec" I personally consider this within the noise range of the CI -- it's not a major gain anyway and having not-too-many jobs is not really nice. Though, I won't holler if others don't mind it. |
Many of the slowest tests just need to be broken up into smaller ones that can then be run in parallel. |
Yeah, they are kind of close/within noise range. Some more variation for three groups:
Although there is the argument to be made for having more, smaller jobs: it frees up workers sooner, especially useful if there's another build waiting in the queue. Also if a single job fails, you can pinpoint what failed a bit more easily (ie. can see if it's the UT group or FT group, and there's less logs to dig through). But I don't mind too strongly either, and can revert back from three to two groups if that's preferred. |
tests/conftest.py
Outdated
@@ -48,6 +60,21 @@ def pytest_collection_modifyitems(items): | |||
"Unknown test type (filename = {})".format(module_path) | |||
) | |||
|
|||
# Skip or run test_install*.py functional tests | |||
if "integration" in item.keywords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this conditional on os.environ.get("CI", False)
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
tox.ini
Outdated
@@ -1,7 +1,27 @@ | |||
[tox] | |||
envlist = | |||
docs, packaging, lint-py2, lint-py3, mypy, | |||
py27, py34, py35, py36, py37, pypy | |||
py27-functional, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use factors here.
https://tox.readthedocs.io/en/latest/config.html#complex-factor-conditions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Wow! I just spotted pypa/pip has 25 parallel workers available on Travis CI! That's compared to just 5 for a normal open-source account (eg. hugovk/pip). That means it's definitely beneficial to have three groups of tests, so all but one will run at the same time right from the word go! That also accounts for the even better improvements seen on pypa/pip (23 min) compared to hugovk/pip (32 min). |
That's probably because someone in PyPA would have asked for us to be given extra workers -- I don't think it'd be a nice thing that pip "consumes" so many workers (there are multiple PyPA projects). Plus, this might be a bit painful for when the builds are not running on the PyPA organization. I notice the total build time has increased by about 20-30 minutes which isn't ideal when you can only run 5 jobs at a time. |
@di do you know anything about this? |
I don't feel very strongly about this but to err on the side of caution, I'm gonna say, let's keep only 2 groups. That's already a lot of workers and a big enough speedup to warrant them. |
I would imagine those 25 are available for all projects under pypa. That's what I've seen on my account and other orgs with 5 or other numbers. |
IIRC, it's a shared pool. Someone else would have to confirm though. :) |
.gitignore
Outdated
@@ -23,6 +23,7 @@ htmlcov/ | |||
.coverage | |||
.coverage.* | |||
.cache | |||
.pytest_cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make the previous line .*cache
-- it'll cover everything. :)
@@ -1,3 +1,5 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
tests/functional/test_check.py
Outdated
@@ -1,3 +1,5 @@ | |||
import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed.
.travis.yml
Outdated
python: nightly | ||
- env: TOXENV=py37-others | ||
python: nightly | ||
- env: TOXENV=py37-unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit test worker can be removed.
Thanks, review comments done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 things more I'd missed in the last review...
.travis/run.sh
Outdated
if [[ $TOXENV == py* ]]; then | ||
if [[ $TOXENV == py*-functional-install ]]; then | ||
# Only run test_install*.py integration tests | ||
tox -- -m integration -n 4 --duration=5 --only_install_tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would -m integration -k "not test_install"
work equally well?
Instead of the additional code in conftest and the custom flag...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's much cleaner, and it also doesn't list hundreds of skipped tests, because they're not selected in the first place.
Updated!
@@ -1,7 +1,7 @@ | |||
[tox] | |||
envlist = | |||
docs, packaging, lint-py2, lint-py3, mypy, | |||
py27, py34, py35, py36, py37, pypy | |||
py{27,34,35,36,37,py,py3}-{functional-install,others} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we still do tox -e py27 and tox -e py36?
If not, that'll be needed IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that needs updating. Do you know how to do that with tox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just add py{27,34,35,36,py,py3}
right above it.
It's a duplication I won't worry about since they're basically one under the other and a mismatch would be visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't use tox much locally, but I think the commands you gave already work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. :)
tests/conftest.py
Outdated
@@ -14,7 +14,7 @@ | |||
from tests.lib.venv import VirtualEnvironment | |||
|
|||
|
|||
def pytest_collection_modifyitems(items): | |||
def pytest_collection_modifyitems(config, items): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can drop this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the -k test_install works out. :P
Merging this since I don't wanna hold up the CI improvements this provides. :) I'll see what can be done for the tox config later. It doesn't seem to be an issue anyways. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
For #4497.
This doesn't reduce the total build time of all jobs, but rather aims to reduce the start-to-finish waiting time for a whole build by making better use of parallel jobs.
Currently, the CI takes about 35 minutes to run (start-to-finish), because the PyPy3 job takes nearly 35 minutes to run. This is the main bottleneck (in start-to-finish running times).
https://travis-ci.org/pypa/pip/builds/382752906
In this PR, the 50 slowest functional tests PyPy3 have been marked with
@pytest.mark.pypy_slow
.New
pypy-slow
andpypy3-slow
build jobs have been added to the matrixThese 50 slow ones are run in there, and to share more of the load, those with
@pytest.mark.network
The unit tests, and the remaining functional tests, are run in the
pypy
andpypy3
jobsBased on https://docs.pytest.org/en/latest/example/simple.html#control-skipping-of-tests-according-to-command-line-option
Also:
I noticed the CPython 3.5 job sometimes takes up to 5 minutes longer than 3.4, so move the slower one first
Finally, #dropthedot: use
pytest
rather thanpytest
https://docs.pytest.org/en/latest/faq.html#why-can-i-use-both-pytest-and-py-test-commandsThis reduces the start-to-finish time from 32/35/38 minutes (last three builds on master) to 27/30/31 minutes (these three commits on my branch) or 22 minutes (this PR build).
Would something along these lines be useful?
(I've not added a news file fragment, let me know if needed.)