Skip to content

Add metadata to mark all license and copyright files to be shipped when redistributing packages #12053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JanBeh opened this issue Apr 28, 2023 · 15 comments
Labels
A-license Area: license handling C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-package Command-publish S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@JanBeh
Copy link

JanBeh commented Apr 28, 2023

Problem

It's idiomatic to split up functionality into many (small) crates. I can easily have hundreds or thousands of dependencies (or even more?) when using Rust. This makes Rust unsuitable to create and ship binaries or packages which include binaries, because even the most liberal licenses require to include copies of the license text (which is often derived from a template) and/or additional files (e.g. a NOTICE file in case of the Apache 2.0 license).

It's practically very very hard for a single maintainer to do this, especially when a lot of dependencies are involved and when wanting to provide regular updates. There are some automated tools, such as

but these seem insufficient, as I explained in this post on URLO.

Moreover, it doesn't feel right to do this using a heuristic (which can fail). Corresponding metadata is missing as of today's Cargo.toml specification.

Proposed Solution

A new metadata field should be added, which is giving an exhaustive list of files to be shipped/bundled when redistributing a package (or part of a package and/or when making a derived work, e.g. when compiling a binary).

Of course this list could be wrong, but then such a crate could be marked as errorneous in a database (just like a vulnerability, because it does sort-of mislead you in a dangerous way). Ideally such a field would become mandatory at some point in future. See also my follow-up on URLO.

Notes

Related issue: #8537, see also comment. Note, however, that license-file isn't a good choice for the proposed feature. That field is meant for custom licenses, and it also wouldn't be suitable to refer to a NOTICE file (or any other file which is required for proper attribution) to be shipped in addition to a license file.

Could the authors field be used, or could something like a copyright-lines field be introduced to be able to recreate the licenses from scratch? Likely not. Many license texts are templates where the copyright holder or product name(s) are incorporated directly into the text of the license. See my comment here on URLO on that matter. For example, the MIT-Modern-Variant license template has "THE UNIVERSITY OF CALIFORNIA" as part of the template text, which is ought to be replaced if the license is used by someone other than the University of California. Thus each license can be unique and possibly must be bundled as-is (i.e. word-by-word) in order to fulfill the license requirements.

@JanBeh JanBeh added C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-triage Status: This issue is waiting on initial triage. labels Apr 28, 2023
@weihanglo
Copy link
Member

Thanks for the proposal!

I am not an expert in this topic. Just got some questions.

  • How a new field of exhaustive list could help the situation ecosystem-wide? Assumed people do not actively fill in the field, downstream developers still have no idea what should be bundled no matter the field is either license, license-file, include, or shiny new redistributed-include.
  • There is a mention of a database doing checks for incorrect lists, but how could a database infer what files should be there with the absence of that list?

Perhaps the precise version of the question is: For a downstream developer, how to figure out what to bundle even when an upstream library doesn't provide sufficient info of what must be bundled.

@weihanglo weihanglo added S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. A-license Area: license handling and removed S-triage Status: This issue is waiting on initial triage. labels May 3, 2023
@JanBeh
Copy link
Author

JanBeh commented May 4, 2023

  • How a new field of exhaustive list could help the situation ecosystem-wide?

It could aid the situation by encouraging crate authors to properly specify an exhaustive list of all files that need to be shipped (due to license requirements) when distributing a derived work.

In the future, specifying such a list could become mandatory. Currently the Cargo docs say here:

Before publishing, make sure you have filled out the following fields:

Moreover in the documentation of the license field::

Note: crates.io requires either license or license-file to be set.

So authors are required to specify a license (ideally in a way that can be machine-interpreted). But currently they are not required to specify the location of the license file and copyright notes in a machine-readable way. Demanding the latter would help solving the problem outlined in the OP.

  • Assumed people do not actively fill in the field, […]

Currently, people also fill-in the license field. I believe it's possible to establish good practices which involve specifying the locations to the corresponding files as well.

  • There is a mention of a database doing checks for incorrect lists, but how could a database infer what files should be there with the absence of that list?

I don't propose inferring the list automatically. My proposal is to encourage crate authors to provide this information.

Note that it's still possible to provide incorrect or incomplete information. This is also possible as of today. For example, see memalloc-0.1.0 (source). That crate specifies license = "MIT" but doesn't ship a license file. How am I supposed to include "the above copyright notice and [the] permission notice" as demanded by the MIT license if the license including the ("above") copyright notice is missing in the crate?

I don't think there is an automatic way of ensuring that redistributed-include (or maybe derived-include would be a better name?) is specified correctly. But in the same way, license can't be verified to be correctly set, i.e. a crate author might just include a different license in the package (or fail to include relevant files for the license to be usable).

However, it's possible to use heuristics to search for packages where redistributed-include might be set wrongly. It's also possible to check and document reports of people who stumbled upon packages with improper license/copyright information. That could be done in a similar way as vulnerabilities are being reported and made public.

Perhaps the precise version of the question is: For a downstream developer, how to figure out what to bundle even when an upstream library doesn't provide sufficient info of what must be bundled.

My proposal is: We should try to avoid the situation that an upstream library doesn't provide sufficient info/data in the first place.

@epage
Copy link
Contributor

epage commented May 4, 2023

Personally, when it comes to license compliance like this, there are a lot of complications and nuance that I think this deserves an RFC, starting with a Pre-RFC on Internals. In preparing the Pre-RFC and RFC, I think it would be important to work with the authors of the aforementioned tools on it and see if you can get a spread of people who deal with software legal compliance. For example, I know of two people at prior points in their career who were the liaisons between R&D and legal for legal compliance (a lot of my caution in this area comes from speaking with one of these). It'd be good to find multiple people like that across the community to get a breadth of experience and perspectives. I wonder if we can get the Foundation to help consult lawyers as well.

In driving this, I would recommend stepping back a bit and re-evaluate how you are approaching other people to avoid derailing this effort. While I've not caught up with everything, the parts of your posts I've skimmed come across with a harsh tone that might make this kind of collaboration more difficult.

@bk2204
Copy link
Contributor

bk2204 commented May 5, 2023

As I've pointed out elsewhere, almost all licenses require that the license text be included with the software. Assuming we're not doing something like Debian's common-licenses directory, that means that every time someone specifies a license of MIT or Apache-2.0 that some license text must be included. My proposal was simply not to complain about the joint use of the license and license-file keywords, since encouraging people not to use both license and license-file actually encourages people to not comply with the license. (The license keyword is useful machine-readable contexts, and license-file is useful for including the text and
copyright information.) Those issues have unfortunately been closed, however.

This poses a practical problem for me as a distributor of Rust-based binaries in my corporate role because I have to personally extract this information out of the Git repository when it's not included in the crate, which is tedious with many such crates. I know that when distributors such as Debian ship a crate or other Rust-based software, they must also include this information, so I'm hardly the only person who would benefit from a change.

Providing some sort of metadata where users could specify the license itself, the copyright information, and any other legally required text would be helpful, I think, and encourage license compliance.

@JanBeh
Copy link
Author

JanBeh commented May 5, 2023

@epage

Personally, when it comes to license compliance like this, there are a lot of complications and nuance that I think this deserves an RFC, starting with a Pre-RFC on Internals.

I agree this is a rather deep issue which deserves thorough consideration instead of taking quick steps. I wanted to open this issue to highlight/track a problem and to propose a potential solution. It doesn't need to be solved quickly (but should be solved eventually, in my opinion), and I'm sorry if opening this issue was the wrong procedure for the development process. I have seen several other issues open, which weren't (in my opinion) addressing the core of the problem properly; hence this issue.

While I've not caught up with everything, the parts of your posts I've skimmed come across with a harsh tone that might make this kind of collaboration more difficult.

As you don't refer to any specific post, I'm not sure what you're talking about. If there's any communication issue, feel free to send me a direct message. Thank you.

@JanBeh
Copy link
Author

JanBeh commented May 8, 2023

I'd like to note that I currently don't have the time to formalize this proposal in terms of writing up an RFC, contacting the authors of the aforementioned crates, or speak to lawyers of the Foundation (I also doubt they'd be available for me, as a contributor). So if it's really necessary to make this a formal process, I'd kindly like someone else to push this issue forward. I do think that there are more people who need this feature or a solution that solves the issue in a similar or better way.

@epage
Copy link
Contributor

epage commented May 9, 2023

I do think that there are more people who need this feature or a solution that solves the issue in a similar or better way.

As a reminder, Rust development is done by volunteers. If someone doesn't step up to lead an effort like this, then it doesn't get done.

@JanBeh
Copy link
Author

JanBeh commented May 9, 2023

After the previous posts, I don't expect this to move forward. I wrote that notice so other people know I won't do the proposed actions regarding involving lawyers and/or doing a community-wide research to get "a breadth of experience and perspectives." Please note that I'm a volunteer too, and I'm not well connected to the Rust developer scene and/or the Foundation.

I merely pointed out some legal issues in my posts and in this issue and wrote up a feature proposal. It's nothing more, nothing less. Feel free to do the proposed actions if you think they are good to do.

It would be nice if my efforts here or elsewhere (as much or as little they may be) are being appreciated. Meta: I don't think writing a feature request or issue report is a bad thing per-se, even if you can't write a corresponding pull-request and/or start further processes needed to fix an issue.

@weihanglo weihanglo added S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. and removed S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. labels May 11, 2023
@spyoungtech
Copy link

spyoungtech commented Feb 20, 2025

I found this issue when trying to research how to get my LICENSE file included in my package files. My first intuition was to specify license-file = "LICENSE" along with license (since this causes LICENSE to show up in the package files) which is met with this warning:

warning: only one of license or license-file is necessary
license should be used if the package license can be expressed with a standard SPDX expression.
license-file should be used if the package uses a non-standard license.

Per previous discussion, if the combination of license and license-file adds semantic difficulty, perhaps an alternative would be to maintain this approach of reserving license-file for custom licenses, but encourage authors to add their LICENSE via include. This is what I'm doing for my packages for now to help users comply with the license terms.

This is more or less what we do in the Python ecosystem, too (FWIW)... you can add the [SPDX] license classifier, but you're also expected to take steps to ensure the license file ends up in all distribution forms. You'll also find numerous issue trackers flagging down authors who have failed to include the LICENSE in one or more distributions (e.g. sdist, wheels, etc.).

Cargo can consider detecting the presence of a LICENSE file (and potentially common variants/locations thereof) and suggesting it be added to include or maybe just adjust the above warning message to suggest something to the effect of "if you simply want your license file to be included in your package, use include instead"

spyoungtech added a commit to spyoungtech/json-five-rs that referenced this issue Feb 20, 2025
…d in distribution, per license requirements.

See also: rust-lang/cargo#12053
@hanna-kruppe
Copy link

Another reason why the include approach is better than license-file (which is currently just a single file) is that many crates have multiple relevant license files, e.g., MIT/Aache-2.0 dual licensing is often done with two files LICENSE-MIT and LICENSE-APACHE.

nekevss added a commit to boa-dev/temporal that referenced this issue Mar 20, 2025
This PR adds the license files to the published package.

Licenses were copied into `temporal_capi` to hopefully work around an
issue with including licenses when publishing workspace packages in
[cargo](rust-lang/cargo#12053).

CC: @Manishearth, in case you have any feedback.
@Manishearth
Copy link
Member

So in my experience one major issue here is that Cargo actually nudges people away from best practices here, especially in a workspace.

It's best practice to use SPDX license codes. However, it's also best practice (and legally required to ship software) to have most licenses bundled with the software.

This happens more or less automatically for standalone crates. You use license = "spdx code" and have some LICENSE files in the repo, and you're set.

However, in a workspace, you'll end up havign LICENSE files at the top level, and nothing in individual crate folders. license-file = "../LICENSE" does work and the file gets copied over before publish, however Cargo explicitly tells you to not mix license and license-file, so this route is discouraged.

@epage I think it would be good to have some signal from the Cargo team as to what their preferred solution is. This is something volunteers can help with, but having some indicator we are not barking up the wrong tree would be great.

I see three solutions, which have been listed above but worth relisting for completeness:

  • In a workspace, you may use license-file with license without getting any warnings, with the understanding that the file is just bundled with your code. This may not be the best idea because it doesn't quite distinguish between the cases where it's just used as a bundling mechanism vs when the licenses actually diverge
  • include = ["../foo"] means "copy foo to the root of the package"
  • redistributable-include = ["../foo"] is a new key where you can specify files to be copied into the root. In theory it could be a map as well, but I don't actually think that's too necessary

@jwodder
Copy link

jwodder commented Mar 21, 2025

@Manishearth Another solution to the license-in-workspace problem that works today is to create a LICENSE symlink in each package's folder pointing to the workspace license, and then cargo publish replaces the symlink with the actual file contents.

@epage
Copy link
Contributor

epage commented Mar 22, 2025

We recently talked about cargo new automatically symlinking licenses from the workspace root. Joy of large backlogs is I wasn't aware of this issue at the time. You can see our notes at #13328 (comment)

@epage
Copy link
Contributor

epage commented Mar 24, 2025

btw some other past discussion on this is at #8537 (comment)

@Manishearth
Copy link
Member

@epage Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-license Area: license handling C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` Command-package Command-publish S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

8 participants