Skip to content

cargo vendor misses some files #14034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Samasaur1 opened this issue Jun 8, 2024 · 7 comments · Fixed by #15514
Closed

cargo vendor misses some files #14034

Samasaur1 opened this issue Jun 8, 2024 · 7 comments · Fixed by #15514
Labels
C-bug Category: bug Command-vendor E-medium Experience: Medium S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@Samasaur1
Copy link

Problem

I'm trying to build this Cargo project with vendored deps. I'm able to build it normally (cargo build), and running cargo vendor exits without an error, but building using the vendored deps fails (cargo build after modifying .cargo/config.toml).

It's pretty clear why it fails to build — cargo vendor hasn't copied the src/generate/templates directory of the tree-sitter-cli dependency, so I see a bunch of errors due to include_str! trying to include nonexistent files.

I'm not a Cargo expert, so I might be missing something, but I didn't see any reason why vendoring the dependencies would
miss this subfolder. Hopefully you can point me in the right direction if there's something obvious I missed.

Steps

  1. git clone https://github.com/andrewtbiehl/kramdown-syntax_tree_sitter
  2. cd kramdown-syntax_tree_sitter/ext/tree_sitter_adapter
  3. RUBY=/path/to/bin/ruby cargo build (observe that this succeeds)
  4. cargo vendor (observe that this appears to succeed)
  5. Modify .cargo/config.toml as directed by the output of cargo vendor
  6. cargo build (observe that this fails)
  7. There will also be no folder vendor/tree-sitter-cli/src/generate/templates, although it does exist in the upstream repo

Possible Solution(s)

No response

Notes

No response

Version

cargo 1.76.0 (c84b36747 2024-01-18)
release: 1.76.0
commit-hash: c84b367471a2db61d2c2c6aab605b14130b8a31b
commit-date: 2024-01-18
host: aarch64-apple-darwin
libgit2: 1.7.1 (sys:0.18.1 vendored)
libcurl: 8.6.0 (sys:0.4.70+curl-8.5.0 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Mac OS 14.5.0 [64-bit]
@Samasaur1 Samasaur1 added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Jun 8, 2024
@Samasaur1 Samasaur1 changed the title cargo vendor misses somr files cargo vendor misses some files Jun 8, 2024
@epage
Copy link
Contributor

epage commented Jun 9, 2024

@epage
Copy link
Contributor

epage commented Jun 9, 2024

I just tried to reproduce this with cargo 1.78.0 (54d8815d0 2024-03-26) and vendor/tree-sitter-cli/src/generate/templates/ exists.

@epage epage added S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. and removed S-triage Status: This issue is waiting on initial triage. labels Jun 9, 2024
@Samasaur1
Copy link
Author

ack, I tried cargo 1.76 and 1.77. Let me try 1.78

@Samasaur1
Copy link
Author

@epage Still fails for me. Here's my new cargo version --verbose

cargo 1.78.0 (54d8815d0 2024-03-26)
release: 1.78.0
commit-hash: 54d8815d04fa3816edc207bbc4dd36bf18014dbc
commit-date: 2024-03-26
host: aarch64-apple-darwin
libgit2: 1.7.2 (sys:0.18.2 vendored)
libcurl: 8.6.0 (sys:0.4.72+curl-8.6.0 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Mac OS 14.5.0 [64-bit]

@Samasaur1
Copy link
Author

This is still failing for me on Cargo 1.79:

cargo 1.79.0 (ffa9cf99a 2024-06-03)
release: 1.79.0
commit-hash: ffa9cf99a594e59032757403d4c780b46dc2c43a
commit-date: 2024-06-03
host: aarch64-apple-darwin
libgit2: 1.7.2 (sys:0.18.3 vendored)
libcurl: 8.6.0 (sys:0.4.72+curl-8.6.0 system ssl:(SecureTransport) LibreSSL/3.3.6)
ssl: OpenSSL 1.1.1w  11 Sep 2023
os: Mac OS 14.5.0 [64-bit]

@ehuss
Copy link
Contributor

ehuss commented Jun 29, 2024

The problem is that src/generate/templates/cargo.toml exists (notice the lowercase c). Cargo will usually not include nested packages when packaging or publishing, and cargo vendor uses the same logic as cargo publish to determine which files to include. Cargo uses the presence of Cargo.toml to detect these nested packages. However, that detection depends on the case-sensitivity of the local filesystem. Presumably tree-sitter-cli was published by someone on a case-sensitive filesystem (like Linux) where the nested detection didn't work. However, when you run cargo vendor on a case-insensitive filesystem, it won't copy those files.

There are a few issues here:

cargo vendor doesn't exactly duplicate the .crate contents

I've long since felt that it would be best for cargo vendor to just extract the .crate file instead of trying to recompute what the file listing should be. I recall having discussions with someone about doing this (@weihanglo does that ring a bell)? I recall having a discussion about trying to add some kind of API to the Source trait to make that possible. But I can't find the PR or issue (I swear someone started working on it). There are some notes about this at #9575 (comment), and #9555 is kinda similar.

Cargo's logic depends on filesystem behavior

Generally, things can get messy when switching between filesystems that don't have the same case sensitivity. Cargo is often inconsistent on whether or not it is case-sensitive or not. For example, depending on using API's like Path("Cargo.toml").exists() which depends on the filesystem versus filename == "Cargo.toml" which is always case-sensitive.

I think it is unlikely that we can be 100% consistent in all places. However, it might be good to try to think about how we should approach this problem. Should cargo always try to be case-sensitive to avoid these cross-platform issues? Should this package-listing code be case-sensitive or not? If we change the behavior, how do we do it without breaking lots of projects?

@ehuss ehuss added S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted. and removed S-needs-info Status: Needs more info, such as a reproduction or more background for a feature request. labels Jun 29, 2024
@ehuss
Copy link
Contributor

ehuss commented Jun 29, 2024

Oh, I remember now. It was #12509 (comment). Unfortunately based on that comment, it sounds like it might not be easy.

@weihanglo weihanglo added the E-medium Experience: Medium label Jul 2, 2024
github-merge-queue bot pushed a commit that referenced this issue May 21, 2025
### What does this PR try to resolve?

`PathSource::list_files` has some heurstic rules for listing files.
Those rules are mainly designed for `cargo package`.

Previously, cargo-vendor relies on those rules to understand what files
to vendor. However, it shouldn't use those rules because:

* Package extracted from a `.crate` tarball isn't Git-controlled, some
rules may apply differently.
* The extracted package already went through `PathSource::list_files`
during packaging. It should be clean enough.
* Should keep crate sources from registry sources in a pristine state,
which is exactly what vendoring is meant for.

Instead, we switch to direct extraction into the vendor directory
to ensure source code is the same as in the `.crate` tarball.

### How should we test and review this PR?

There is already a test `vendor::package_exclude` for the fix of #9054,
though now I think it is not a good fix. The test change shows the
correct behavior change.

I am not sure if we want more tests.

Also, there are some caveats with this fix:

* The overwrite protection in `unpack_package` assumes the unpack
  directory is always `<pkg>-<version`>.
  We don't want to remove this,
  but for cargo-vendor supports vendoring without version suffix.
  For that case, we need a temporary staging area,
  and move the unpacked source then.
* The heurstic in `PathSource::list_files` did something "good" in
  general cases, like excluding hidden directories. That means
  common directories like `.github` or `.config` won't be vendored.
  After this, those get included. This is another round of churns.
  We might want to get other `cargo-vendor` changes along with this
  in one single release.

### Additional information

* Fixes #9054
* Fixes #9555
* Fixes #9575
* Fixes #11000
* Fixes #14034
* Fixes #15080
* Fixes #15090

This also opens a door for

* #10310
* #13191

Since we are changing vendored source (again), we might want to remove
the `.rej`/`.orig` special rules in cargo-vendor, as well as look into
the new source-dedup vendor dir layout.

<!-- TRIAGEBOT_START -->

<!-- TRIAGEBOT_SUMMARY_START -->

### Summary Notes

-
[benchmark-result](#15514 (comment))
by [weihanglo](https://github.com/weihanglo)

Generated by triagebot, see
[help](https://forge.rust-lang.org/triagebot/note.html) for how to add
more
<!--
TRIAGEBOT_SUMMARY_DATA_START$${"entries_by_url":{"https://github.com/rust-lang/cargo/pull/15514#issuecomment-2870275766":{"title":"benchmark-result","comment_url":"https://github.com/rust-lang/cargo/pull/15514#issuecomment-2870275766","author":"weihanglo"}}}$$TRIAGEBOT_SUMMARY_DATA_END
-->

<!-- TRIAGEBOT_SUMMARY_END -->
<!-- TRIAGEBOT_END -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug Command-vendor E-medium Experience: Medium S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants