[FR] Ecosystem regression checks? #4920

hauntsaninja · 2025-03-25T19:45:33Z

In the typing world, we also used to face a problem of causing regressions for users with larger impact than we expected. One of the ways we mostly resolved this was by putting ecosystem impact analyses into the CI of various typing related projects, for instance, typeshed, mypy, pyright, etc

Given issues like #4910 or from a few months ago #4519 and a few others beyond that, it appears setuptools sometimes releases changes that have a broader impact than expected. Maybe ecosystem checks would be useful here too?

This could look like scripts that attempt to build/install a large number of third party packages.

Data-driven estimates help us quantify the benefit of the change compared to the impact e.g. if we knew it would break XYZ number of projects, we might decide enforcing an underscore vs hyphen isn't worth it. Backward compat in build tools is especially useful given the role it plays in reproducibility (Python is used in a lot of science!), so excited to explore the space of better quantifying backward compat concerns.

Thanks for everything!

eli-schwartz · 2025-03-26T08:45:45Z

Meson does something slightly similar. In our case, we publish release candidates once every few months and get them packaged in the experimental branch of multiple linux distros, then ask distribution integrators to test a mass build of all packaged software and provide us (okay, I really mean "me" 😦) the results of the builds to compare. Generally any widespread issue will be caught by this and fixed in time for the final release.

It has saved us from getting egg on our faces, a number of times...

webknjaz · 2025-03-26T15:15:41Z

Might be worth following the pattern from https://github.com/pypa/setuptools/blob/main/.github/workflows/ci-sage.yml or PyCA's “downstream” concept too?

webknjaz · 2025-03-26T15:16:09Z

Also, did you mean to link https://github.com/hauntsaninja/mypy_primer?

abravalheri · 2025-03-27T23:19:21Z

Thank you very much @hauntsaninja, I think that this is welcome, specially if there are volunteers to carry the implementation out.

However, we would need to be very careful so that something like this would not be hold against the maintainers¹. It is good to take informed decisions, but if there is no maintainer willing to take care of a specific functionality, this functionality will end up being removed. In the end of the day informed decisions are only informed decisions, not a compromise that the maintainers will never introduce breaking changes.

So ideally it would be interesting to have a system similar to the one you described but that also proactively warns other packages in the ecosystem about upcoming non backwards compatible changes. That would be fantastic because right now the capacity that setuptools has for communicating breaking changes (or even useful information like https://github.com/pypa/setuptools/blob/v78.1.0/setuptools/command/editable_wheel.py#L489-L493 and https://github.com/pypa/setuptools/blob/v78.1.0/setuptools/command/editable_wheel.py#L555-L556) is seriously limited by the frontends hiding the warnings¹.

we might decide enforcing an underscore vs hyphen isn't worth it.

Let's not oversimplify the situation. I invite everyone that want to know the details to study the history of the deprecation warning, why it was introduced, how there has been subsequent problems with the implementation, the extra cost involved in deciding when to replace - with _ and the status of the codebase. I also believe that the implementation existing before v78 had some oversights in it (some were fixed in v78, some were not because I did not want to introduce more complexity - I was also hopping that follow up changes after the removal would be able to further simplify the code base). Overall it looks like high maintenance to me.

The recent heated discussion contains examples of this kind of toxic argumentation. ↩ ↩²

abravalheri · 2025-03-27T23:22:12Z

Might be worth following the pattern from https://github.com/pypa/setuptools/blob/main/.github/workflows/ci-sage.yml or PyCA's “downstream” concept too?

To be honest ci-sage.yml is something that does not work in my opinion. It is too complex (and I don't have the energy to maintain it), too slow, and very often there is something wrong happening with it which is difficult to interpret if it was actually caused by setuptools. In practice it is too maintenance intensive and often ignored.

eli-schwartz · 2025-03-28T00:04:13Z

With Meson this is "easy" because we upload Release Candidates once a week for usually ~3 weeks before the final release, and this gets picked up by Gentoo. We have a special, long-term arrangement that Gentoo's distribution-wide continuous integration enables Release Candidates of Meson in 25% of all runs.

Given a week or two of Release Candidate testing with hundreds of packages that depend on meson, issues quickly show up and get reported back upstream. With setuptools there are thousands of packages.

Prereleases are great for this sort of thing, because they are relatively speaking extremely easy to test in a coordinated way (for pypi you can simply have dedicated CI jobs that install all your dependencies with --pre) while still being opt-in and not breaking production or "required CI jobs". But you do have to commit to releasing on a schedule instead of simply whenever you make an exciting new change.

webknjaz · 2025-03-28T22:55:56Z

FWIW, pip used to do pre-releases but dropped that because nobody was testing them and so it seemed pointless. It can be a powerful tool but requires someone to actually use it. I think that a primer-style check might be more useful, running in PRs.

webknjaz · 2025-03-29T17:26:02Z

Interestingly, ci-sage could've caught this if it wasn't left unmaintained: https://github.com/pypa/setuptools/actions/runs/14147225459/job/39636009893?pr=4875#step:11:3914. But if @mkoeppe dropped the ball, it might be a reason for deleting it.

webknjaz · 2025-03-29T17:38:38Z

I mentioned PyCA earlier but didn't link what they do exactly. Here's what they do: https://github.com/pyca/cryptography/blob/30d6698/.github/workflows/ci.yml#L356-L411 / https://github.com/pyca/cryptography/tree/30d6698/.github/downstream.d.

mkoeppe · 2025-03-30T00:39:33Z

ci-sage could've caught this if it wasn't left unmaintained: https://github.com/pypa/setuptools/actions/runs/14147225459/job/39636009893?pr=4875#step:11:3914

I've opened #4929 to update this workflow

abravalheri · 2025-03-31T12:44:36Z

I believe pre-releases are beneficial and can be quite effective in certain scenarios¹. I've experimented with them in the past, but currently, the setuptools workflows CI are not very compatible with this practice.

But can we tell if any "opt-in" kind of solution would be more effective? For example, right now users can opt-in to convert warnings into errors in a non blocking CI pipeline for "monitoring purposes", but the adoption of this practice is not generalised. Would pre-releases face similar challenges?

Additionally, the release cadence of setuptools somehow "makes sense" to me now. I have never worked before in a project that uses the same methodology, so when I first joined it felt very unfamiliar. But after have worked for a while I can see the benefits: it is precisely the fact that setuptools releases smaller batches of changes that allow us to quickly track and fix major incidents (with same-day patch releases). If we were bundling changes together, this would not be an easy task, as successive changes inside the same batch tend to intertwine and become co-dependent.

We can discuss adding support for pre-releases to the setuptools CI tooling in a separate issue or PR (I think it would be a quality of life improvement), but I don't think they are the ultimate solution to the problem.

I like the idea of having separated regression checks and also the option to automatically open issues in repositories prompting for changes. I think that has the potential of being quite effective.

For example, it is something that I would have liked to do for the PEP 639 implementation. ↩

eli-schwartz · 2025-03-31T13:16:33Z

But can we tell if any "opt-in" kind of solution would be more effective? For example, right now users can opt-in to convert warnings into errors in a non blocking CI pipeline for "monitoring purposes", but the adoption of this practice is not generalised. Would pre-releases face similar challenges?

@abravalheri I did talk about how Meson handles this exact concern.

webknjaz added enhancement proposal Needs Discussion Issues where the implementation still needs to be discussed. Long Term labels Mar 26, 2025

eli-schwartz mentioned this issue Mar 28, 2025

Governance and Change Management Review After Breaking Change in #4870 #4919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] Ecosystem regression checks? #4920

[FR] Ecosystem regression checks? #4920

hauntsaninja commented Mar 25, 2025

eli-schwartz commented Mar 26, 2025

webknjaz commented Mar 26, 2025

webknjaz commented Mar 26, 2025

abravalheri commented Mar 27, 2025 •

edited

Loading

abravalheri commented Mar 27, 2025 •

edited

Loading

eli-schwartz commented Mar 28, 2025

webknjaz commented Mar 28, 2025

webknjaz commented Mar 29, 2025

webknjaz commented Mar 29, 2025

mkoeppe commented Mar 30, 2025

abravalheri commented Mar 31, 2025

eli-schwartz commented Mar 31, 2025

[FR] Ecosystem regression checks? #4920

[FR] Ecosystem regression checks? #4920

Comments

hauntsaninja commented Mar 25, 2025

eli-schwartz commented Mar 26, 2025

webknjaz commented Mar 26, 2025

webknjaz commented Mar 26, 2025

abravalheri commented Mar 27, 2025 • edited Loading

Footnotes

abravalheri commented Mar 27, 2025 • edited Loading

eli-schwartz commented Mar 28, 2025

webknjaz commented Mar 28, 2025

webknjaz commented Mar 29, 2025

webknjaz commented Mar 29, 2025

mkoeppe commented Mar 30, 2025

abravalheri commented Mar 31, 2025

Footnotes

eli-schwartz commented Mar 31, 2025

abravalheri commented Mar 27, 2025 •

edited

Loading

abravalheri commented Mar 27, 2025 •

edited

Loading