-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[Pre-RFC] Enable .cargo/root
, CARGO_ROOTS
+ --root
to limit config + manifest searches
#15620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
5d75611
to
7ea749b
Compare
Personally, I do not look at PRs until a design is agreed on. Having design discussions here splits the conversation and makes it harder to follow. This is also feature with a lot of potential workflow considerations and I would personally want to see a more deliberate approach, like an RFC,. to make sure we understand what problems we are trying to solve, how this has been solved before, and how this will impact people. FYI there is also discussion at #7871 and this is somewhat related to rust-lang/rfcs#3279 |
Right, I at least like to write some code, where possible, as a proof of concept / point of reference before getting into design discussions to try and avoid getting too academic, but it makes sense to avoid fragmented design discussions here, beyond linking to existing discussions or perhaps giving pointers for framing a new RFC. Thanks for the links - I hadn't come across those discussions from searching myself. |
If a root directory is specified then cargo will only search ancestors looking for `.cargo` config files and workspaces up until the root directory is reached (instead of walking to the root of the filesystem). A root can be specified in three ways: 1. The existence of a `.cargo/root` file (discovered by checking parents up towards the root of the filesystem) 2. By setting the `CARGO_ROOT` environment variable 3. By passing `--root` on the command line If more than one is specified then the effective root is the one that's most-specific / closest to the current working directory. ### What does this PR try to resolve? Fixes: rust-lang#5418 (for my use case then rust-lang#6805 isn't a practical workaround) This goes some way to allow nesting of workspaces, by adding a way to limit the directories that Cargo searches when looking for config files and workspace manifests. This does not enable nesting in the form of workspace inheritance - but does provide the means to technically have a filesystem with nested workspaces (that aren't aware of each other) and be able to hide any outer (unrelated) workspace while building a nested workspace. This gives more flexibility for tools that want to use cargo as an implementation detail. In particular this allows you to sandbox the build of nested third-party workspaces that may be (unknowingly) dynamically unpacked within an outer workspace, in situations where neither the workspace being built and the outer workspace are owned by the tool that is managing the build. For example a tool based on rustdoc-json should be able to fetch and build documentation for third-party crates under any user-specified build/target directory without having to worry about spooky action at a distance due to config files and workspaces in ancestor directories. In my case, I have a runtime for coding with LLMs that is given a repo to work on and is expected to keep its artifacts contained to a `.ai/` directory. This runtime supports building markdown documentation for Rust crates, which involves using cargo to generate rustdoc-json data. That tool is expected to keep its artifacts contained within `.ai/docs/rust/build/`. It's possible that the project itself is Rust based and could define a workspace or `.cargo/config.toml` but from the pov of this toolchain those have nothing to do with the crate whose documentation is being generated (which are packages downloaded from crates.io). ### How to test and review this PR? TODO: write tests
This generalizes `CARGO_ROOTS` so it can accept a list of directories, similar to `GIT_CEILING_DIRECTORIES`. This way a default configuration could come from a user's `.bashrc` and could be extended later by specific build tools. Similarly this also makes it so `--root` can be passed repeatedly.
It's no longer assumed that we can resolve a single stop_at_root path for both finding config files and manifests. GlobalContext now maintains a sorted list of user-configured roots and caches two resolved SearchRoutes; one for loading config files, the other for manifests (both automatically invalidated if roots are added or start paths change). A `SearchRoute` represents a starting directory and the closest root directory, where parent traversal will stop. If searching starts from outside of all configured roots then no ancestor traversal is allowed. - This is a safety measure to reduce the risk of loading unsafe config files that aren't owned by the user (e.g. under `c:/.cargo` on Windows or `/tmp` on Linux) By default the user's home directory is added as a root if CARGO_ROOTS is not set. - Combined with the rule above, this means that Cargo has safe defaults and will not load configs outside of your home directory, unless you explicitly set CARGO_ROOTS. - This is also a measure to avoid triggering home directory automounters TODO: there still needs to be a special case for loading a project manifest, before searching for a workspace. Cargo should be allowed to walk ancestors outside of all configured roots, when loading the first Cargo.toml. In this case the directory of the manifest should immediately be set as a root. This would be a safety / ergonomics trade-off to allow building of crates under `/tmp/package/nested/dir` but still disallow attempts to load `/tmp/Cargo.toml` as a workspace. TODO: Cargo should log a warning when searching for files outside of any root, so it's easy to see why ancestor config files or manifests are not being loaded.
As a special-case; unlike for configs and workspaces, this allows Cargo to search all the way up to the root of the filesystem when loading a package manifest, IFF Cargo is running outside of all configured root directories. This exception doesn't apply if Cargo is running within a configured root directory. This is a trade off between safety and convenience (+ backwards compatibility) that ensures it is possible to unpack a package outside of your home directory (such as `/tmp/my-package`) and then start a build from a subdirectory like `/tmp/my-package/sub/dir` such that Cargo will traverse parent directories looking for `/tmp/my-package/Cargo.toml` but it will not try and load untrusted configs under `/tmp/.cargo` or workspace manifests under `/tmp/Cargo.toml` that could have been created by another user. FIXME: If the first manifest loaded is in fact a workspace manifest we need to make sure we add that as a root directory so that when nested packages within the workspace are built they are able to find their way back to the top of the workspace. Otherwise each nested package build will be limited to the directory of the package.
To allow a package to be unpacked outside of all root directories, such as `/tmp/my-package` and allow a build to start from `/tmp/my-package/sub/dir`, Cargo is allowed to traverse parent directories until it finds its first manifest, and this then becomes a fallback root directory that will stop Cargo from looking any higher for a workspace (which could attempt to load `/tmp/Cargo.toml` that may not be safe). XXX: Instead of having a special case for the first manifest; maybe it could be better to instead _always_ add the directory of a manifest as a root directory if it's not a subdirectory of any existing root. That would have a mostly-equivalent result to the current behaviour but _might_ be simpler to document?
To help make it clear why Cargo may not load all ancestor configs or find workspace manifests in ancestor directories, Cargo emits a warning: ``` warning: Cargo is running outside of any root directory, limiting loading of ancestor configs and manifest ```
7ea749b
to
d3167ed
Compare
.cargo/root
, CARGO_ROOT
+ --root
to limit config + workspace searches.cargo/root
, CARGO_ROOTS
+ --root
to limit config + manifest searches
While I've been looking at some of the other related issues in #7871 (including issues with I'll try and find some time to follow up on the discussions for those separately, but I'm currently optimistic that something close to this draft could be a viable solution for several of these issues. |
This is a draft proposal that I think could address a number of issues in Cargo (e.g. #5418, #7871) and could be an alternative (or complementary) approach to addressing the safety issues discussed in rust-lang/rfcs#3279
The intention here is to prototype a practical solution that can then hopefully inform any follow up discussions.
My current motivation for looking at this comes from using Cargo as part of a rustdoc-markdown toolchain that should be able to build third-party crates without worrying about side effects from unrelated configs or workspace manifests in parent directories.
--
This adds the notion of root directories to Cargo, which act as limits to how far Cargo will search ancestors looking for
.cargo
config files orCargo.toml
manifests (instead of walking to the root of the filesystem).This has some similarity to Git's
GIT_CEILING_DIRECTORIES
mechanism.There are three notable searches that root directories affect:
.cargo/Config.toml
filesCargo.toml
package manifestCargo.toml
workspace manifest.Roots can be specified in three ways:
CARGO_ROOTS
environment variable to a colon (semicolon on Windows) separated list of directories--root
on the command line (can be passed multiple times)i. Suited to tool configurations where appending to environment variables may be limited (e.g. limited environment variable interpolation in vscode settings)
.cargo/root
file (discovered by checking parents up towards the root of the filesystem)i. Suited to sandboxing directories when the absolute path might vary, such in within monorepos that could be cloned to different locations or directories that may get mounted in Docker containers.
By default, if
CARGO_ROOTS
has not been set then the user's home directory is implicitly added as a root directory./tmp/.cargo/
, orC:/.cargo
or searching for workspace manifests.The default home-directory root also mitigates issues with triggering automounters on some systems.
Whenever Cargo needs to search ancestors for files it will resolve the closest root directory that is an ancestor of the starting directory and that root is the last directory Cargo will search.
If the starting directory is not within any root then Cargo will only search the starting directory. Cargo will also print a warning to indicate that it's loading of configs + workspace manifests is limited.
As a safety / ergonomics (+ backwards compatibility) trade-off, there is a special case that allows Cargo to search up to the root of the filesystem when searching for the first
Cargo.toml
package manifest, IFF the starting directory is outside of all roots. This enables a package to be unpacked under a shared directory like/tmp/my-package
and start a build from/tmp/my-package/sub/dir
which will load/tmp/my-package/Cargo.toml
but will not then allow Cargo to attempt loading a workspace at/tmp/Cargo.toml
.What does this PR try to resolve?
Fixes: #5418
Fixes: #7871
/tmp
Addresses: rust-lang/rfcs#3279
This gives more flexibility for tools that want to use cargo as an implementation detail. In particular this allows you to sandbox the build of nested third-party workspaces that may be (unknowingly) dynamically unpacked within an outer workspace, in situations where neither the workspace being built and the outer workspace are owned by the tool that is managing the build.
This does not attempt to support nesting workspaces with inheritence (as described in #5042).
For example a tool based on rustdoc-json should be able to fetch and build documentation for third-party crates under any user-specified build/target directory without having to worry about spooky action at a distance due to config files and workspaces in ancestor directories.
In my case, I have an runtime for coding with LLMs that is given a repo to work on and is expected to keep its artifacts contained to a
.ai/
directory. This runtime supports building markdown documentation for Rust crates, which involves using cargo to generate rustdoc-json data. That tool is expected to keep its artifacts contained within.ai/docs/rust/build/
within the project being worked on. It's possible that the project itself is Rust based and could define a workspace or.cargo/config.toml
but from the pov of this toolchain those have nothing to do with the crate whose documentation is being generated (which are packages downloaded from crates.io).How to test and review this PR?
TODO: write tests
Root directories can be smoke tested by attempting to build any package that doesn't define a workspace, nested within a repo that does, E.g.:
Known Issues
The current implementation makes heavy use of
std::fs::canonicalize
while the original expectation had been that the UNC prefixes for Windows wouldn't matter when just internally checking paths for equivalence.As things have evolved I think the use of
canonicalize
should be replaced with a normalization that just maps root directories to absolute paths and uses Cargo'snormalize_path
utility. This way we would avoid resolving symlinks and rely on lexical path comparisons.Ideally the normalized paths can even be kept as a hidden implementation detail when resolving a
SearchRoute
that would allow the following use ofpath::ancestors
to iterate in accordance with any non-normalized starting directory.