Skip to content

CI: Split PyPy tests to speed up end-to-end running time #5436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
May 26, 2018
Merged

CI: Split PyPy tests to speed up end-to-end running time #5436

merged 16 commits into from
May 26, 2018

Conversation

hugovk
Copy link
Contributor

@hugovk hugovk commented May 24, 2018

For #4497.

This doesn't reduce the total build time of all jobs, but rather aims to reduce the start-to-finish waiting time for a whole build by making better use of parallel jobs.

Also:

This reduces the start-to-finish time from 32/35/38 minutes (last three builds on master) to 27/30/31 minutes (these three commits on my branch) or 22 minutes (this PR build).

Would something along these lines be useful?

(I've not added a news file fragment, let me know if needed.)

@pradyunsg
Copy link
Member

These 50 slow ones are run in there, and to share more of the load, those with @pytest.mark.network

Is there a better way to choose the 50 slowest tests? Why 50?

I'm just concerned that this would go out of date over time and we'll have slow tests that don't get marked as such.

I noticed the CPython 3.5 job sometimes takes up to 5 minutes longer than 3.4, so move the slower one first

👍

#dropthedot

👍

@hugovk
Copy link
Contributor Author

hugovk commented May 24, 2018

Why 50?

The main aim is split the functional tests up in some way, any way, to make the jobs more granular to make full use of parallel builds, so no runners are sitting idle for long periods of time.

I added 5 (~5min), 25 (~7m), then 50 (~13min) to try and get the pypy3 and pypy3-slow jobs to take about the same time. Then added the network tests (~21min) which brought them about equal.

Is there a better way to choose the 50 slowest tests? ... I'm just concerned that this would go out of date over time and we'll have slow tests that don't get marked as such.

A valid concern. In fact, it doesn't really matter if some tests are the slowest or not, we just want to split them into two groups.

We could instead split them arbitrarily by test filename. Perhaps do all test_install*.py in one (15 files) and the rest (20 files) in another.

This would probably help with the CPython jobs too.

The unit tests only take about 30 seconds, but there may be some extra benefit putting them in their own jobs too.

@pfmoore
Copy link
Member

pfmoore commented May 24, 2018

Overall, I'm in favour of improving the CI runtimes, and the PyPy3 times are particularly annoying, because as you say that's the one that blocks everything. If I read your results right, you're getting around a 5-7 minute improvement (15-20%) which is pretty good.

But I agree with @pradyunsg that the process is pretty arbitrary as it stands. Ideally, I'd prefer that we look a bit more closely at the possibilities here. Some thoughts:

  • We install pytest-xdist (according to requirements-dev.txt) but I don't know much about it. Are we actually running the tests in parallel? Would doing so help if not?
  • It might be useful to understand why the slowest tests are so slow. Is it something specific to pypy, or something more general?
  • On Windows we run the functional tests separately from the unit tests. Would there be benefit in doing that on Travis too?
  • Maybe we should split the tests into more groups than just functional and unit, and have separate tox environments for each set. I don't know whether having lots of workers in Travis will be faster than having multiple environments run in parallel in one worker - we could experiment there as well.
  • Do we even need to run the full test suite on PyPy3? I wonder how many users we have using it, anyway? Would just running unit tests be sufficient?

Having said all of that, I'd rather we got some improvement from just accepting this PR as it stands, than doing nothing in the hope that someone would be interested in doing a more extensive job - after all it's not like we can't make further improvements later.

@hugovk
Copy link
Contributor Author

hugovk commented May 24, 2018

  • We install pytest-xdist (according to requirements-dev.txt) but I don't know much about it. Are we actually running the tests in parallel? Would doing so help if not?

Yes, it's being used by the -n 4 argument in run.sh for integration tests, and I expect it's making a difference. I'll try it for the unit tests too, might help a bit there.

  • It might be useful to understand why the slowest tests are so slow. Is it something specific to pypy, or something more general?

This would be useful to find out for #4497, I think it's out of scope of this one. I have noticed that PyPy tests tends to be slower on other projects too, for example https://travis-ci.org/python-pillow/Pillow/.

  • On Windows we run the functional tests separately from the unit tests. Would there be benefit in doing that on Travis too?

Possibly. Unit tests take about 30 seconds on Travis. Splitting them out might get some small wins, but the real slowness is in the functional tests. But I'll give it a go later.

  • Maybe we should split the tests into more groups than just functional and unit, and have separate tox environments for each set. I don't know whether having lots of workers in Travis will be faster than having multiple environments run in parallel in one worker - we could experiment there as well.

Generally, if the overhead per worker isn't too high, then smaller and more granular runs are better because all parallel workers are used for longer. We don't want the situation where 4 workers have finished, and we're waiting for 1 long one to finish.

I've just tried something like this in another branch, by putting the test_install*.py functional tests into their own worker (aka build job), and get similar results to this PR: 27 min 4 sec. https://travis-ci.org/hugovk/pip/builds/383237894

  • Do we even need to run the full test suite on PyPy3? I wonder how many users we have using it, anyway? Would just running unit tests be sufficient?

Can that be checked from BigQuery?

@pradyunsg
Copy link
Member

The unit tests only take about 30 seconds, but there may be some extra benefit putting them in their own jobs too.

I doubt. There's a certain amount of time it takes to actually getting to the point of running tests. I think it's fine to leave them in as-is.

@pradyunsg
Copy link
Member

Can that be checked from BigQuery?

Yes. For the last month, comparing aggregated counts:

Row details_implementation_name download_count
1 CPython 465770244
2 null 73417814
3 PyPy 866085
4 Jython 2950
5 IronPython 377
6 Pyston 53

@pradyunsg
Copy link
Member

Oh, and I can't go on BigQuery and not post a huge table on GitHub immediately after. :P


implementation.(name, version) -> downloads as on 24th May 23:50 IST.

SELECT
   details.implementation.name, details.implementation.version, COUNT(*) as download_count
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "month"),
    CURRENT_TIMESTAMP()
  )
GROUP BY
  details.implementation.name, details.implementation.version
ORDER BY
  download_count DESC, details.implementation.name ASC, details.implementation.version DESC
LIMIT
  100

Gives:

A table with first 100 entries
details_implementation_name details_implementation_version download_count
CPython 2.7.12 104240006
null null 73417814
CPython 2.7.13 70108990
CPython 2.7.6 44752943
CPython 2.7.14 40130804
CPython 3.6.5 36142283
CPython 3.6.2 22454465
CPython 3.5.2 18744519
CPython 2.7.9 18717644
CPython 2.7.5 18431643
CPython 3.6.3 13473432
CPython 2.7.10 13386307
CPython 3.6.4 11609881
CPython 3.4.3 7497481
CPython 3.5.3 7003274
CPython 3.5.5 5295515
CPython 2.7.15 4389776
CPython 3.6.1 3288331
CPython 2.7.11 2232454
CPython 3.5.4 2191310
CPython 3.4.2 2009513
CPython 2.7.15rc1 1883371
CPython 2.6.6 1822523
CPython 3.5.1 1798943
CPython 3.6.0 1569082
CPython 2.7.7 1485958
CPython 2.7.3 1388041
CPython 3.4.7 1380183
CPython 2.7.8 1312078
CPython 3.4.8 1096671
CPython 2.6.9 1076065
CPython 3.4.6 1046062
CPython 3.4.5 955454
CPython 3.6.5rc1 368741
CPython 3.5.0 289683
CPython 3.4.4 277371
CPython 3.4.0 260176
PyPy 2.4.0 221411
CPython 3.4.1 213101
CPython 3.7.0a4+ 199229
CPython 2.7.14+ 149185
PyPy 5.8.0 126404
CPython 3.3.6 123794
PyPy 6.0.0 118623
CPython 3.7.0b4 112393
CPython 2.7.11+ 110617
CPython 3.5.1+ 89552
CPython 2.7.12+ 80657
PyPy 5.8.0.beta.0 72558
PyPy 5.1.0 64209
CPython 3.6.5+ 63098
CPython 3.7.0b3 63082
PyPy 5.10.0 58082
PyPy 5.10.1 44433
PyPy 5.7.1 43699
CPython 3.5.5+ 31025
CPython 3.5.2+ 28754
CPython 2.7.13+ 28681
CPython 3.5.4rc1 26770
CPython 3.3.3 26455
CPython 3.5.3+ 25332
CPython 3.7.0b2 23937
PyPy 5.3.1 22915
CPython 2.7.10rc1 22469
PyPy 5.0.1 19436
CPython 3.3.5 17471
CPython 2.7.0 16001
CPython 2.7.1 15753
PyPy 5.9.0 15488
PyPy 5.6.0 15477
CPython 3.6.4+ 12574
CPython 3.3.7 12454
PyPy 5.4.1 12058
CPython 3.7.0a4 10322
CPython 3.6.4rc1 9468
CPython 3.2.6 8943
CPython 3.7.0a2 8874
CPython 3.7.0b1 8656
CPython 3.3.2 8340
CPython 2.7.4 8145
CPython 3.6.0rc2 7749
CPython 3.6.1rc1 7281
CPython 3.6.3rc1 5914
CPython 3.6.2rc2 5780
CPython 3.2.3 5508
PyPy 5.4.0 5444
CPython 3.6.2rc1 5142
PyPy 5.9.0.beta.0 5139
CPython 2.6.8 4344
PyPy 5.7.1.beta.0 4318
CPython 3.6.0b2 3860
CPython 3.7.0a1 3716
CPython 3.8.0a0 3257
CPython 2.7.14rc1 3190
PyPy 4.0.1 3173
CPython 2.7.2 3127
CPython 3.4.3+ 2857
CPython 3.6.0a2 2559
CPython 2.7.13rc1 2344

@pradyunsg
Copy link
Member

I've just tried something like this in another branch, by putting the test_install*.py functional tests into their own worker (aka build job), and get similar results to this PR: 27 min 4 sec.

Let's do this instead then. :)

@pradyunsg
Copy link
Member

3 PyPy 866085

Can we just have the PyPy tests only run on cron jobs on master (i.e. when TRAVIS_EVENT_TYPE=cron and TRAVIS_BRANCH=master)?

@hugovk
Copy link
Contributor Author

hugovk commented May 24, 2018

Decent numbers for PyPy, coming up to a million, but of course a long way from CPython's ~500M (and ~100M from whatever's in null).


FT = functional test
UT = unit test


Can we just have the PyPy tests only run on cron jobs on master (i.e. when TRAVIS_EVENT_TYPE=cron and TRAVIS_BRANCH=master)?

Yes. I'd like to suggest that is done in another PR to keep the changes separate, and easier to compare/review.

@hugovk
Copy link
Contributor Author

hugovk commented May 24, 2018

  • We install pytest-xdist (according to requirements-dev.txt) but I don't know much about it. Are we actually running the tests in parallel? Would doing so help if not?

Yes, it's being used by the -n 4 argument in run.sh for integration tests, and I expect it's making a difference.

It makes a huge difference. With parallel turned off, the build was terminated after 50 minutes for taking too long!

https://travis-ci.org/hugovk/pip/builds/383276628

@pradyunsg
Copy link
Member

"27 min 4 sec" vs "26 min 22 sec"

I personally consider this within the noise range of the CI -- it's not a major gain anyway and having not-too-many jobs is not really nice.

Though, I won't holler if others don't mind it.

@pradyunsg
Copy link
Member

something more general?

Many of the slowest tests just need to be broken up into smaller ones that can then be run in parallel.

@hugovk
Copy link
Contributor Author

hugovk commented May 24, 2018

Yeah, they are kind of close/within noise range. Some more variation for three groups:

Although there is the argument to be made for having more, smaller jobs: it frees up workers sooner, especially useful if there's another build waiting in the queue.

Also if a single job fails, you can pinpoint what failed a bit more easily (ie. can see if it's the UT group or FT group, and there's less logs to dig through).

But I don't mind too strongly either, and can revert back from three to two groups if that's preferred.

@@ -48,6 +60,21 @@ def pytest_collection_modifyitems(items):
"Unknown test type (filename = {})".format(module_path)
)

# Skip or run test_install*.py functional tests
if "integration" in item.keywords:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this conditional on os.environ.get("CI", False) as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

tox.ini Outdated
@@ -1,7 +1,27 @@
[tox]
envlist =
docs, packaging, lint-py2, lint-py3, mypy,
py27, py34, py35, py36, py37, pypy
py27-functional,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@hugovk
Copy link
Contributor Author

hugovk commented May 25, 2018

Wow! I just spotted pypa/pip has 25 parallel workers available on Travis CI! That's compared to just 5 for a normal open-source account (eg. hugovk/pip).

That means it's definitely beneficial to have three groups of tests, so all but one will run at the same time right from the word go!

That also accounts for the even better improvements seen on pypa/pip (23 min) compared to hugovk/pip (32 min).

@pradyunsg
Copy link
Member

pradyunsg commented May 25, 2018

I just spotted pypa/pip has 25 parallel workers available on Travis CI!

That's probably because someone in PyPA would have asked for us to be given extra workers -- I don't think it'd be a nice thing that pip "consumes" so many workers (there are multiple PyPA projects).

Plus, this might be a bit painful for when the builds are not running on the PyPA organization. I notice the total build time has increased by about 20-30 minutes which isn't ideal when you can only run 5 jobs at a time.

@pradyunsg
Copy link
Member

That's probably because someone in PyPA would have asked for us to be given extra workers -- I don't think it'd be a nice thing that pip "consumes" so many workers (there are multiple PyPA projects).

@di do you know anything about this?

@pradyunsg
Copy link
Member

I don't feel very strongly about this but to err on the side of caution, I'm gonna say, let's keep only 2 groups. That's already a lot of workers and a big enough speedup to warrant them.

@hugovk
Copy link
Contributor Author

hugovk commented May 25, 2018

I would imagine those 25 are available for all projects under pypa. That's what I've seen on my account and other orgs with 5 or other numbers.

@pradyunsg
Copy link
Member

I would imagine those 25 are available for all projects under pypa.

IIRC, it's a shared pool. Someone else would have to confirm though. :)

@hugovk
Copy link
Contributor Author

hugovk commented May 25, 2018

Here's some timings of master v. 2 groups v. 3 groups on pypa/pip and hugovk/pip:

table showing 3 builds of each type and their average; a summary follows

Summary:

  • pypa/pip (25 parallel) goes from ~36m (master) -> ~20m (2 groups) or ~22m (3 groups)
  • hugovk/pip (5 parallel) goes from ~36m (master) -> ~28m (2 groups) or ~30m (3 groups)

@pradyunsg pradyunsg added C: tests Testing and related things skip news Does not need a NEWS file entry (eg: trivial changes) C: automation Automated checks, CI etc type: maintenance Related to Development and Maintenance Processes labels May 25, 2018
.gitignore Outdated
@@ -23,6 +23,7 @@ htmlcov/
.coverage
.coverage.*
.cache
.pytest_cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make the previous line .*cache -- it'll cover everything. :)

@@ -1,3 +1,5 @@
import pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

@@ -1,3 +1,5 @@
import pytest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed.

.travis.yml Outdated
python: nightly
- env: TOXENV=py37-others
python: nightly
- env: TOXENV=py37-unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit test worker can be removed.

@hugovk
Copy link
Contributor Author

hugovk commented May 25, 2018

Thanks, review comments done.

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 things more I'd missed in the last review...

.travis/run.sh Outdated
if [[ $TOXENV == py* ]]; then
if [[ $TOXENV == py*-functional-install ]]; then
# Only run test_install*.py integration tests
tox -- -m integration -n 4 --duration=5 --only_install_tests
Copy link
Member

@pradyunsg pradyunsg May 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would -m integration -k "not test_install" work equally well?

Instead of the additional code in conftest and the custom flag...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's much cleaner, and it also doesn't list hundreds of skipped tests, because they're not selected in the first place.

Updated!

@@ -1,7 +1,7 @@
[tox]
envlist =
docs, packaging, lint-py2, lint-py3, mypy,
py27, py34, py35, py36, py37, pypy
py{27,34,35,36,37,py,py3}-{functional-install,others}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we still do tox -e py27 and tox -e py36?

If not, that'll be needed IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, that needs updating. Do you know how to do that with tox?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just add py{27,34,35,36,py,py3} right above it.

It's a duplication I won't worry about since they're basically one under the other and a mismatch would be visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't use tox much locally, but I think the commands you gave already work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. :)

@@ -14,7 +14,7 @@
from tests.lib.venv import VirtualEnvironment


def pytest_collection_modifyitems(items):
def pytest_collection_modifyitems(config, items):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop this change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the -k test_install works out. :P

@pradyunsg pradyunsg self-assigned this May 26, 2018
@pradyunsg
Copy link
Member

Merging this since I don't wanna hold up the CI improvements this provides. :)

I'll see what can be done for the tox config later. It doesn't seem to be an issue anyways.

@pradyunsg pradyunsg merged commit 61b0112 into pypa:master May 26, 2018
@hugovk hugovk deleted the split-pypy-tests branch May 26, 2018 11:22
@hugovk hugovk mentioned this pull request May 28, 2018
12 tasks
@lock
Copy link

lock bot commented Jun 2, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 2, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation C: automation Automated checks, CI etc C: tests Testing and related things skip news Does not need a NEWS file entry (eg: trivial changes) type: maintenance Related to Development and Maintenance Processes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants