CI: Split PyPy tests to speed up end-to-end running time #5436

hugovk · 2018-05-24T10:46:04Z

This doesn't reduce the total build time of all jobs, but rather aims to reduce the start-to-finish waiting time for a whole build by making better use of parallel jobs.

Currently, the CI takes about 35 minutes to run (start-to-finish), because the PyPy3 job takes nearly 35 minutes to run. This is the main bottleneck (in start-to-finish running times).
https://travis-ci.org/pypa/pip/builds/382752906
- The PyPy job takes about 24 minutes, the next slowest
In this PR, the 50 slowest functional tests PyPy3 have been marked with @pytest.mark.pypy_slow.
New pypy-slow and pypy3-slow build jobs have been added to the matrix
These 50 slow ones are run in there, and to share more of the load, those with @pytest.mark.network
The unit tests, and the remaining functional tests, are run in the pypy and pypy3 jobs
Based on https://docs.pytest.org/en/latest/example/simple.html#control-skipping-of-tests-according-to-command-line-option

Also:

I noticed the CPython 3.5 job sometimes takes up to 5 minutes longer than 3.4, so move the slower one first
Finally, #dropthedot: use pytest rather than pytest https://docs.pytest.org/en/latest/faq.html#why-can-i-use-both-pytest-and-py-test-commands

This reduces the start-to-finish time from 32/35/38 minutes (last three builds on master) to 27/30/31 minutes (these three commits on my branch) or 22 minutes (this PR build).

Would something along these lines be useful?

(I've not added a news file fragment, let me know if needed.)

pradyunsg · 2018-05-24T12:07:46Z

These 50 slow ones are run in there, and to share more of the load, those with @pytest.mark.network

Is there a better way to choose the 50 slowest tests? Why 50?

I'm just concerned that this would go out of date over time and we'll have slow tests that don't get marked as such.

I noticed the CPython 3.5 job sometimes takes up to 5 minutes longer than 3.4, so move the slower one first

👍

#dropthedot

👍

hugovk · 2018-05-24T12:41:53Z

Why 50?

The main aim is split the functional tests up in some way, any way, to make the jobs more granular to make full use of parallel builds, so no runners are sitting idle for long periods of time.

I added 5 (~5min), 25 (~7m), then 50 (~13min) to try and get the pypy3 and pypy3-slow jobs to take about the same time. Then added the network tests (~21min) which brought them about equal.

Is there a better way to choose the 50 slowest tests? ... I'm just concerned that this would go out of date over time and we'll have slow tests that don't get marked as such.

A valid concern. In fact, it doesn't really matter if some tests are the slowest or not, we just want to split them into two groups.

We could instead split them arbitrarily by test filename. Perhaps do all test_install*.py in one (15 files) and the rest (20 files) in another.

This would probably help with the CPython jobs too.

The unit tests only take about 30 seconds, but there may be some extra benefit putting them in their own jobs too.

pfmoore · 2018-05-24T15:28:27Z

Overall, I'm in favour of improving the CI runtimes, and the PyPy3 times are particularly annoying, because as you say that's the one that blocks everything. If I read your results right, you're getting around a 5-7 minute improvement (15-20%) which is pretty good.

But I agree with @pradyunsg that the process is pretty arbitrary as it stands. Ideally, I'd prefer that we look a bit more closely at the possibilities here. Some thoughts:

We install pytest-xdist (according to requirements-dev.txt) but I don't know much about it. Are we actually running the tests in parallel? Would doing so help if not?
It might be useful to understand why the slowest tests are so slow. Is it something specific to pypy, or something more general?
On Windows we run the functional tests separately from the unit tests. Would there be benefit in doing that on Travis too?
Maybe we should split the tests into more groups than just functional and unit, and have separate tox environments for each set. I don't know whether having lots of workers in Travis will be faster than having multiple environments run in parallel in one worker - we could experiment there as well.
Do we even need to run the full test suite on PyPy3? I wonder how many users we have using it, anyway? Would just running unit tests be sufficient?

Having said all of that, I'd rather we got some improvement from just accepting this PR as it stands, than doing nothing in the hope that someone would be interested in doing a more extensive job - after all it's not like we can't make further improvements later.

hugovk · 2018-05-24T16:27:11Z

We install pytest-xdist (according to requirements-dev.txt) but I don't know much about it. Are we actually running the tests in parallel? Would doing so help if not?

Yes, it's being used by the -n 4 argument in run.sh for integration tests, and I expect it's making a difference. I'll try it for the unit tests too, might help a bit there.

It might be useful to understand why the slowest tests are so slow. Is it something specific to pypy, or something more general?

This would be useful to find out for #4497, I think it's out of scope of this one. I have noticed that PyPy tests tends to be slower on other projects too, for example https://travis-ci.org/python-pillow/Pillow/.

On Windows we run the functional tests separately from the unit tests. Would there be benefit in doing that on Travis too?

Possibly. Unit tests take about 30 seconds on Travis. Splitting them out might get some small wins, but the real slowness is in the functional tests. But I'll give it a go later.

Maybe we should split the tests into more groups than just functional and unit, and have separate tox environments for each set. I don't know whether having lots of workers in Travis will be faster than having multiple environments run in parallel in one worker - we could experiment there as well.

Generally, if the overhead per worker isn't too high, then smaller and more granular runs are better because all parallel workers are used for longer. We don't want the situation where 4 workers have finished, and we're waiting for 1 long one to finish.

I've just tried something like this in another branch, by putting the test_install*.py functional tests into their own worker (aka build job), and get similar results to this PR: 27 min 4 sec. https://travis-ci.org/hugovk/pip/builds/383237894

Do we even need to run the full test suite on PyPy3? I wonder how many users we have using it, anyway? Would just running unit tests be sufficient?

Can that be checked from BigQuery?

pradyunsg · 2018-05-24T17:51:31Z

The unit tests only take about 30 seconds, but there may be some extra benefit putting them in their own jobs too.

I doubt. There's a certain amount of time it takes to actually getting to the point of running tests. I think it's fine to leave them in as-is.

pradyunsg · 2018-05-24T18:21:20Z

Can that be checked from BigQuery?

Yes. For the last month, comparing aggregated counts:

Row	details_implementation_name	download_count
1	CPython	465770244
2	null	73417814
3	PyPy	866085
4	Jython	2950
5	IronPython	377
6	Pyston	53

pradyunsg · 2018-05-24T18:23:02Z

Oh, and I can't go on BigQuery and not post a huge table on GitHub immediately after. :P

implementation.(name, version) -> downloads as on 24th May 23:50 IST.

SELECT
   details.implementation.name, details.implementation.version, COUNT(*) as download_count
FROM
  TABLE_DATE_RANGE(
    [the-psf:pypi.downloads],
    DATE_ADD(CURRENT_TIMESTAMP(), -1, "month"),
    CURRENT_TIMESTAMP()
  )
GROUP BY
  details.implementation.name, details.implementation.version
ORDER BY
  download_count DESC, details.implementation.name ASC, details.implementation.version DESC
LIMIT
  100

Gives:

A table with first 100 entries

details_implementation_name	details_implementation_version	download_count
CPython	2.7.12	104240006
null	null	73417814
CPython	2.7.13	70108990
CPython	2.7.6	44752943
CPython	2.7.14	40130804
CPython	3.6.5	36142283
CPython	3.6.2	22454465
CPython	3.5.2	18744519
CPython	2.7.9	18717644
CPython	2.7.5	18431643
CPython	3.6.3	13473432
CPython	2.7.10	13386307
CPython	3.6.4	11609881
CPython	3.4.3	7497481
CPython	3.5.3	7003274
CPython	3.5.5	5295515
CPython	2.7.15	4389776
CPython	3.6.1	3288331
CPython	2.7.11	2232454
CPython	3.5.4	2191310
CPython	3.4.2	2009513
CPython	2.7.15rc1	1883371
CPython	2.6.6	1822523
CPython	3.5.1	1798943
CPython	3.6.0	1569082
CPython	2.7.7	1485958
CPython	2.7.3	1388041
CPython	3.4.7	`1380183`
CPython	2.7.8	1312078
CPython	3.4.8	1096671
CPython	2.6.9	1076065
CPython	3.4.6	1046062
CPython	3.4.5	955454
CPython	3.6.5rc1	368741
CPython	3.5.0	289683
CPython	3.4.4	277371
CPython	3.4.0	260176
PyPy	2.4.0	221411
CPython	3.4.1	213101
CPython	3.7.0a4+	199229
CPython	2.7.14+	149185
PyPy	5.8.0	126404
CPython	3.3.6	123794
PyPy	6.0.0	118623
CPython	3.7.0b4	112393
CPython	2.7.11+	110617
CPython	3.5.1+	89552
CPython	2.7.12+	80657
PyPy	5.8.0.beta.0	72558
PyPy	5.1.0	64209
CPython	3.6.5+	63098
CPython	3.7.0b3	63082
PyPy	5.10.0	58082
PyPy	5.10.1	44433
PyPy	5.7.1	43699
CPython	3.5.5+	31025
CPython	3.5.2+	28754
CPython	2.7.13+	28681
CPython	3.5.4rc1	26770
CPython	3.3.3	26455
CPython	3.5.3+	25332
CPython	3.7.0b2	23937
PyPy	5.3.1	22915
CPython	2.7.10rc1	22469
PyPy	5.0.1	19436
CPython	3.3.5	17471
CPython	2.7.0	16001
CPython	2.7.1	15753
PyPy	5.9.0	15488
PyPy	5.6.0	15477
CPython	3.6.4+	12574
CPython	3.3.7	12454
PyPy	5.4.1	12058
CPython	3.7.0a4	10322
CPython	3.6.4rc1	9468
CPython	3.2.6	8943
CPython	3.7.0a2	8874
CPython	3.7.0b1	8656
CPython	3.3.2	8340
CPython	2.7.4	8145
CPython	3.6.0rc2	7749
CPython	3.6.1rc1	7281
CPython	3.6.3rc1	5914
CPython	3.6.2rc2	5780
CPython	3.2.3	5508
PyPy	5.4.0	5444
CPython	3.6.2rc1	5142
PyPy	5.9.0.beta.0	5139
CPython	2.6.8	4344
PyPy	5.7.1.beta.0	4318
CPython	3.6.0b2	3860
CPython	3.7.0a1	3716
CPython	3.8.0a0	3257
CPython	2.7.14rc1	3190
PyPy	4.0.1	3173
CPython	2.7.2	3127
CPython	3.4.3+	2857
CPython	3.6.0a2	2559
CPython	2.7.13rc1	2344

pradyunsg · 2018-05-24T18:28:10Z

I've just tried something like this in another branch, by putting the test_install*.py functional tests into their own worker (aka build job), and get similar results to this PR: 27 min 4 sec.

Let's do this instead then. :)

pradyunsg · 2018-05-24T18:32:55Z

3 PyPy 866085

Can we just have the PyPy tests only run on cron jobs on master (i.e. when TRAVIS_EVENT_TYPE=cron and TRAVIS_BRANCH=master)?

hugovk · 2018-05-24T19:39:38Z

Decent numbers for PyPy, coming up to a million, but of course a long way from CPython's ~500M (and ~100M from whatever's in null).

FT = functional test
UT = unit test

Two groups:
a) test_install*.py FTs,
b) UTs and all the other FTs
27 min 4 sec
https://travis-ci.org/hugovk/pip/builds/383237894
Three groups:
a) test_install*.py FTs,
b) the other FTs,
c) UTs
26 min 22 sec -> slight improvement. I'll update this branch to match.
https://travis-ci.org/hugovk/pip/builds/383284037
Three groups. Also use 4 processes for unit tests
31 min -> surprisingly got worse. They're so quick, no need to farm them out to separate processes.
https://travis-ci.org/hugovk/pip/builds/383301271

Can we just have the PyPy tests only run on cron jobs on master (i.e. when TRAVIS_EVENT_TYPE=cron and TRAVIS_BRANCH=master)?

Yes. I'd like to suggest that is done in another PR to keep the changes separate, and easier to compare/review.

hugovk · 2018-05-24T19:43:43Z

We install pytest-xdist (according to requirements-dev.txt) but I don't know much about it. Are we actually running the tests in parallel? Would doing so help if not?

Yes, it's being used by the -n 4 argument in run.sh for integration tests, and I expect it's making a difference.

It makes a huge difference. With parallel turned off, the build was terminated after 50 minutes for taking too long!

https://travis-ci.org/hugovk/pip/builds/383276628

pradyunsg · 2018-05-24T19:57:30Z

"27 min 4 sec" vs "26 min 22 sec"

I personally consider this within the noise range of the CI -- it's not a major gain anyway and having not-too-many jobs is not really nice.

Though, I won't holler if others don't mind it.

pradyunsg · 2018-05-24T19:58:17Z

something more general?

Many of the slowest tests just need to be broken up into smaller ones that can then be run in parallel.

hugovk · 2018-05-24T20:18:47Z

Yeah, they are kind of close/within noise range. Some more variation for three groups:

22 min 22 sec (this PR) https://travis-ci.org/pypa/pip/builds/383369130
31 min 40 sec (my fork) https://travis-ci.org/hugovk/pip/builds/383369083

Although there is the argument to be made for having more, smaller jobs: it frees up workers sooner, especially useful if there's another build waiting in the queue.

Also if a single job fails, you can pinpoint what failed a bit more easily (ie. can see if it's the UT group or FT group, and there's less logs to dig through).

But I don't mind too strongly either, and can revert back from three to two groups if that's preferred.

pradyunsg · 2018-05-24T20:25:25Z

tests/conftest.py

@@ -48,6 +60,21 @@ def pytest_collection_modifyitems(items):
                "Unknown test type (filename = {})".format(module_path)
            )

+        # Skip or run test_install*.py functional tests
+        if "integration" in item.keywords:


Make this conditional on os.environ.get("CI", False) as well?

pradyunsg · 2018-05-24T20:26:54Z

tox.ini

@@ -1,7 +1,27 @@
 [tox]
 envlist =
    docs, packaging, lint-py2, lint-py3, mypy,
-    py27, py34, py35, py36, py37, pypy
+    py27-functional,


Please use factors here.

https://tox.readthedocs.io/en/latest/config.html#complex-factor-conditions

hugovk · 2018-05-25T06:09:07Z

Wow! I just spotted pypa/pip has 25 parallel workers available on Travis CI! That's compared to just 5 for a normal open-source account (eg. hugovk/pip).

That means it's definitely beneficial to have three groups of tests, so all but one will run at the same time right from the word go!

That also accounts for the even better improvements seen on pypa/pip (23 min) compared to hugovk/pip (32 min).

pradyunsg · 2018-05-25T06:21:46Z

I just spotted pypa/pip has 25 parallel workers available on Travis CI!

That's probably because someone in PyPA would have asked for us to be given extra workers -- I don't think it'd be a nice thing that pip "consumes" so many workers (there are multiple PyPA projects).

Plus, this might be a bit painful for when the builds are not running on the PyPA organization. I notice the total build time has increased by about 20-30 minutes which isn't ideal when you can only run 5 jobs at a time.

pradyunsg · 2018-05-25T06:22:05Z

That's probably because someone in PyPA would have asked for us to be given extra workers -- I don't think it'd be a nice thing that pip "consumes" so many workers (there are multiple PyPA projects).

@di do you know anything about this?

pradyunsg · 2018-05-25T06:24:55Z

I don't feel very strongly about this but to err on the side of caution, I'm gonna say, let's keep only 2 groups. That's already a lot of workers and a big enough speedup to warrant them.

hugovk · 2018-05-25T06:24:56Z

I would imagine those 25 are available for all projects under pypa. That's what I've seen on my account and other orgs with 5 or other numbers.

pradyunsg · 2018-05-25T06:27:55Z

I would imagine those 25 are available for all projects under pypa.

IIRC, it's a shared pool. Someone else would have to confirm though. :)

hugovk · 2018-05-25T12:07:11Z

Here's some timings of master v. 2 groups v. 3 groups on pypa/pip and hugovk/pip:

Summary:

pypa/pip (25 parallel) goes from ~36m (master) -> ~20m (2 groups) or ~22m (3 groups)
hugovk/pip (5 parallel) goes from ~36m (master) -> ~28m (2 groups) or ~30m (3 groups)

pradyunsg · 2018-05-25T16:30:53Z

.gitignore

@@ -23,6 +23,7 @@ htmlcov/
 .coverage
 .coverage.*
 .cache
+.pytest_cache


nit: make the previous line .*cache -- it'll cover everything. :)

pradyunsg · 2018-05-25T16:31:39Z

tests/functional/test_install_check.py

@@ -1,3 +1,5 @@
+import pytest


This can be removed.

pradyunsg · 2018-05-25T16:31:45Z

tests/functional/test_check.py

@@ -1,3 +1,5 @@
+import pytest


This can be removed.

pradyunsg · 2018-05-25T17:02:44Z

.travis.yml

+      python: nightly
+    - env: TOXENV=py37-others
+      python: nightly
+    - env: TOXENV=py37-unit


unit test worker can be removed.

hugovk · 2018-05-25T20:46:55Z

Thanks, review comments done.

pradyunsg

2 things more I'd missed in the last review...

pradyunsg · 2018-05-26T03:40:53Z

.travis/run.sh

-if [[ $TOXENV == py* ]]; then
+if [[ $TOXENV == py*-functional-install ]]; then
+    # Only run test_install*.py integration tests
+    tox -- -m integration -n 4 --duration=5 --only_install_tests


Would -m integration -k "not test_install" work equally well?

Instead of the additional code in conftest and the custom flag...

Yes, that's much cleaner, and it also doesn't list hundreds of skipped tests, because they're not selected in the first place.

Updated!

pradyunsg · 2018-05-26T03:41:39Z

tox.ini

@@ -1,7 +1,7 @@
 [tox]
 envlist =
    docs, packaging, lint-py2, lint-py3, mypy,
-    py27, py34, py35, py36, py37, pypy
+    py{27,34,35,36,37,py,py3}-{functional-install,others}


Can we still do tox -e py27 and tox -e py36?

If not, that'll be needed IMO.

Good point, that needs updating. Do you know how to do that with tox?

I'd just add py{27,34,35,36,py,py3} right above it.

It's a duplication I won't worry about since they're basically one under the other and a mismatch would be visible.

Actually, I don't use tox much locally, but I think the commands you gave already work.

…er selection

pradyunsg · 2018-05-26T08:15:30Z

tests/conftest.py

@@ -14,7 +14,7 @@
 from tests.lib.venv import VirtualEnvironment


-def pytest_collection_modifyitems(items):
+def pytest_collection_modifyitems(config, items):


Can drop this change.

If the -k test_install works out. :P

pradyunsg · 2018-05-26T11:13:35Z

Merging this since I don't wanna hold up the CI improvements this provides. :)

I'll see what can be done for the tox config later. It doesn't seem to be an issue anyways.

lock · 2019-06-02T12:06:16Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

hugovk added 3 commits May 23, 2018 18:03

Split out slow/network PyPy tests

7daba15

Run slower Python 3.5 tests before 3.4

0afee66

#dropthedot

dc15af8

hugovk added 2 commits May 24, 2018 16:14

Ignore .pytest_cache

9b985b6

PyPy: run test_install* separately

a8ee95e

hugovk added 2 commits May 24, 2018 18:57

CPython: run test_install* separately

c61dd5a

Remove @pytest.mark.pypy_slow

cd60f50

Run unit tests separately

0a34c04

pradyunsg reviewed May 24, 2018

View reviewed changes

Factor tox envs

268688c

hugovk added 2 commits May 25, 2018 13:21

Only group integration tests on CI

a1ec757

Tests in two groups, not three

c3d1437

pradyunsg added C: tests Testing and related things skip news Does not need a NEWS file entry (eg: trivial changes) C: automation Automated checks, CI etc type: maintenance Related to Development and Maintenance Processes labels May 25, 2018

pradyunsg requested changes May 25, 2018

View reviewed changes

hugovk added 3 commits May 25, 2018 23:44

More generic ignore

b1c5ddf

Remove redundant imports

7d20ebd

Remove ununsed tox env

47ef949

pradyunsg reviewed May 26, 2018

View reviewed changes

Use -k "[not] test_install" instead of --only_install_tests for clean…

db56ace

…er selection

pradyunsg reviewed May 26, 2018

View reviewed changes

pradyunsg self-assigned this May 26, 2018

No longer need config

f3f5d83

pradyunsg approved these changes May 26, 2018

View reviewed changes

pradyunsg merged commit 61b0112 into pypa:master May 26, 2018

hugovk deleted the split-pypy-tests branch May 26, 2018 11:22

hugovk mentioned this pull request May 28, 2018

Speeding up Tests #4497

Closed

12 tasks

pradyunsg mentioned this pull request May 29, 2018

Investigate why the pypy3 CI job is timing out #4846

Closed

lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 2, 2019

lock bot locked as resolved and limited conversation to collaborators Jun 2, 2019

CI: Split PyPy tests to speed up end-to-end running time #5436

CI: Split PyPy tests to speed up end-to-end running time #5436

Conversation

hugovk commented May 24, 2018 • edited Loading

pradyunsg commented May 24, 2018

hugovk commented May 24, 2018

pfmoore commented May 24, 2018

hugovk commented May 24, 2018

pradyunsg commented May 24, 2018

pradyunsg commented May 24, 2018

pradyunsg commented May 24, 2018

pradyunsg commented May 24, 2018

pradyunsg commented May 24, 2018

hugovk commented May 24, 2018

hugovk commented May 24, 2018 • edited Loading

pradyunsg commented May 24, 2018

pradyunsg commented May 24, 2018

hugovk commented May 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hugovk commented May 25, 2018

pradyunsg commented May 25, 2018 • edited Loading

pradyunsg commented May 25, 2018

pradyunsg commented May 25, 2018

hugovk commented May 25, 2018

pradyunsg commented May 25, 2018

hugovk commented May 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hugovk commented May 25, 2018

pradyunsg left a comment

Choose a reason for hiding this comment

pradyunsg May 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pradyunsg commented May 26, 2018

lock bot commented Jun 2, 2019

hugovk commented May 24, 2018 •

edited

Loading

hugovk commented May 24, 2018 •

edited

Loading

pradyunsg commented May 25, 2018 •

edited

Loading

pradyunsg May 26, 2018 •

edited

Loading