Skip to content

Possible contradiction in rules for naming source distribution files containing -_. #1750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
priolacci opened this issue Dec 16, 2024 · 4 comments
Open
1 task done

Comments

@priolacci
Copy link

priolacci commented Dec 16, 2024

Issue Description

The specification page for source distributions states the following:

The file name of a sdist was standardised in PEP 625. The file name must be in the form {name}-{version}.tar.gz, where {name} is normalised according to the same rules as for binary distributions (see Binary distribution format), and {version} is the canonicalized form of the project version (see Version specifiers).

The name and version components of the filename MUST match the values stored in the metadata contained in the file.

The rules for binary distributions states that -_. should be replaced by _ in distribution names.

But the naming convention for core metadatas requires the package maintainer to follow this normalization, which states -_. should be replaced by -.

Hence, for a package named for instance "my-package" v0.1, the metadata specification would require the "Name" attribute in the metadata to be my-package, while the specification for source distribution files requires the name of the file to be my_package-0.1.tar.gz, making it impossible to follow this part of the source distribution filename specification: "The name and version components of the filename MUST match the values stored in the metadata contained in the file."

Code of Conduct

  • I am aware that participants in this repository must follow the PSF Code of Conduct.
@webknjaz
Copy link
Member

I suppose "match" != "be equal exactly"? It might mean "correspond" in this context.

@zahlman
Copy link
Contributor

zahlman commented Apr 9, 2025

The rules for binary distributions states that -_. should be replaced by _ in distribution names.

This is correct. The entire point is not to use -, so that the -s in the filename can unambiguously understood as separators (between the name, version string and wheel tags, if applicable).

But the naming convention for core metadatas requires the package maintainer to follow this normalization, which states -_. should be replaced by -.

By my understanding: this doesn't refer specifically to "core metadata" (meaning the PKG-INFO file in an sdist or METADATA file in a wheel) - it comes first in the "Package Distribution Metadata" section, before the "Core metadata specifications". But more importantly, it's only making recommendations ("should" rather than "must"), and also isn't talking about the actual content of the metadata. This section is instead about how tools understand metadata. There are normalization rules so that they can consider different values to be equivalent; this implies that any of the equivalent values is legal in the input. Key quote:

It also describes how to normalize [names for packages and extras], which should be done before lookups and comparisons.

What the core metadata specification says about the name, rather, is that

It must conform to the name format specification.

which refers back to the other part of your first link:

A valid name consists only of ASCII letters and numbers, period, underscore and hyphen. It must start and end with a letter or number.

I don't read "the names should be normalized before comparing" as constraining the metadata file content. If it does, that really needs to be clarified.

By my reading, though, the filename can't always contain the same distribution name as the metadata; it contains a normalized version instead. I agree that "match" is intended here to mean "correspond". It would be better to make this explicit, and explicitly refer to the normalization rules there as well.

@abravalheri
Copy link
Contributor

abravalheri commented Apr 9, 2025

Yes I agree with @zahlman and @webknjaz, there is a difference between the package naming as provided by the (human) user1 and the file naming as intended to be consumed by (automated) tools.

The wheel and sdist file names need to be optimised so that the installers can quickly infer the version. That is why the package name need to be "normalised" by replacing . and - with _ when creating the archive. This way the heuristic for inferring the version is simply a variation of filename.split('-')[1].

When comparing names, (automated) tools will also perform some (other) form of "normalisation" to find matching (but not identical) names.

Footnotes

  1. and it may contain .- characters

@abravalheri
Copy link
Contributor

abravalheri commented Apr 9, 2025

Hence, for a package named for instance "my-package" v0.1, the metadata specification would require the "Name" attribute in the metadata to be my-package

This normalisation when writing core metadata files is not necessary. There were many discussions at the time of the proposal and adoption about this, the conclusion is that the metadata should preserve the intent of the user (how they want to call the package), but tools should normalising it when comparing or processing the names in automated processes.

There are many related topics in the Discourse, but this is a short one summarising the conclusions https://discuss.python.org/t/revisiting-distribution-name-normalization/12348.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@abravalheri @webknjaz @zahlman @priolacci and others