Skip to content

[Pre-RFC] Enable .cargo/root, CARGO_ROOTS + --root to limit config + manifest searches #15620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

rib
Copy link

@rib rib commented Jun 2, 2025

This is a draft proposal that I think could address a number of issues in Cargo (e.g. #5418, #7871) and could be an alternative (or complementary) approach to addressing the safety issues discussed in rust-lang/rfcs#3279

The intention here is to prototype a practical solution that can then hopefully inform any follow up discussions.

My current motivation for looking at this comes from using Cargo as part of a rustdoc-markdown toolchain that should be able to build third-party crates without worrying about side effects from unrelated configs or workspace manifests in parent directories.

--

This adds the notion of root directories to Cargo, which act as limits to how far Cargo will search ancestors looking for .cargo config files or Cargo.toml manifests (instead of walking to the root of the filesystem).

This has some similarity to Git's GIT_CEILING_DIRECTORIES mechanism.

There are three notable searches that root directories affect:

  1. How far should Cargo search when loading .cargo/Config.toml files
  2. How far should Cargo search from when loading a Cargo.toml package manifest
  3. How far should Cargo search from a package manifest directory looking for a Cargo.toml workspace manifest.

Roots can be specified in three ways:

  1. By setting the CARGO_ROOTS environment variable to a colon (semicolon on Windows) separated list of directories
  2. By passing --root on the command line (can be passed multiple times)
    i. Suited to tool configurations where appending to environment variables may be limited (e.g. limited environment variable interpolation in vscode settings)
  3. The existence of a .cargo/root file (discovered by checking parents up towards the root of the filesystem)
    i. Suited to sandboxing directories when the absolute path might vary, such in within monorepos that could be cloned to different locations or directories that may get mounted in Docker containers.

By default, if CARGO_ROOTS has not been set then the user's home directory is implicitly added as a root directory.

  • This is for safer default behaviour that stops Cargo from loading configs from shared directories like /tmp/.cargo/, or C:/.cargo or searching for workspace manifests.

The default home-directory root also mitigates issues with triggering automounters on some systems.

Whenever Cargo needs to search ancestors for files it will resolve the closest root directory that is an ancestor of the starting directory and that root is the last directory Cargo will search.

If the starting directory is not within any root then Cargo will only search the starting directory. Cargo will also print a warning to indicate that it's loading of configs + workspace manifests is limited.

As a safety / ergonomics (+ backwards compatibility) trade-off, there is a special case that allows Cargo to search up to the root of the filesystem when searching for the first Cargo.toml package manifest, IFF the starting directory is outside of all roots. This enables a package to be unpacked under a shared directory like /tmp/my-package and start a build from /tmp/my-package/sub/dir which will load /tmp/my-package/Cargo.toml but will not then allow Cargo to attempt loading a workspace at /tmp/Cargo.toml.

What does this PR try to resolve?

Fixes: #5418

Fixes: #7871

  • Addresses safety issues with building under shared directories like /tmp
  • Mitigates triggering home directory automounters

Addresses: rust-lang/rfcs#3279

This gives more flexibility for tools that want to use cargo as an implementation detail. In particular this allows you to sandbox the build of nested third-party workspaces that may be (unknowingly) dynamically unpacked within an outer workspace, in situations where neither the workspace being built and the outer workspace are owned by the tool that is managing the build.

This does not attempt to support nesting workspaces with inheritence (as described in #5042).

For example a tool based on rustdoc-json should be able to fetch and build documentation for third-party crates under any user-specified build/target directory without having to worry about spooky action at a distance due to config files and workspaces in ancestor directories.

In my case, I have an runtime for coding with LLMs that is given a repo to work on and is expected to keep its artifacts contained to a .ai/ directory. This runtime supports building markdown documentation for Rust crates, which involves using cargo to generate rustdoc-json data. That tool is expected to keep its artifacts contained within .ai/docs/rust/build/ within the project being worked on. It's possible that the project itself is Rust based and could define a workspace or .cargo/config.toml but from the pov of this toolchain those have nothing to do with the crate whose documentation is being generated (which are packages downloaded from crates.io).

How to test and review this PR?

TODO: write tests

Root directories can be smoke tested by attempting to build any package that doesn't define a workspace, nested within a repo that does, E.g.:

mkdir cargo-root-test
cd cargo-root-test

# Clone anything that defines an outer workspace...
git clone https://github.com/serde-rs/serde.git

# Then clone a nested repo that doesn't have a workspace
cd serde
git clone https://github.com/meta-rust/rust-hello-world

# Attempting to build will lead to an error like "error: current package believes it's in a workspace when it's not"
cargo build

# But if we know that the outer workspaces is unrelated we can specify a root directory and isolate the build...
cargo --root . build
# Or
env CARGO_ROOTS="$PWD" cargo build
# Or
mkdir .cargo
touch .cargo/root
cargo build

Known Issues

The current implementation makes heavy use of std::fs::canonicalize while the original expectation had been that the UNC prefixes for Windows wouldn't matter when just internally checking paths for equivalence.

As things have evolved I think the use of canonicalize should be replaced with a normalization that just maps root directories to absolute paths and uses Cargo's normalize_path utility. This way we would avoid resolving symlinks and rely on lexical path comparisons.

Ideally the normalized paths can even be kept as a hidden implementation detail when resolving a SearchRoute that would allow the following use of path::ancestors to iterate in accordance with any non-normalized starting directory.

@rustbot rustbot added A-cli Area: Command-line interface, option parsing, etc. A-configuration Area: cargo config files and env vars A-documenting-cargo-itself Area: Cargo's documentation A-workspaces Area: workspaces labels Jun 2, 2025
@rib rib force-pushed the rib/pr/root-dir-search-limit branch from 5d75611 to 7ea749b Compare June 2, 2025 11:12
@epage
Copy link
Contributor

epage commented Jun 2, 2025

I guess something like this might need a [Pre-]RFC?, but as I'm #5418 (comment) with nested workspaces while using Cargo as part of a rustdoc-markdown toolchain, I figured I'd go ahead and take a stab at drafting a solution that could at least work for my use case, to get some practical insight into the problem.

Personally, I do not look at PRs until a design is agreed on. Having design discussions here splits the conversation and makes it harder to follow. This is also feature with a lot of potential workflow considerations and I would personally want to see a more deliberate approach, like an RFC,. to make sure we understand what problems we are trying to solve, how this has been solved before, and how this will impact people.

FYI there is also discussion at #7871 and this is somewhat related to rust-lang/rfcs#3279

@rib
Copy link
Author

rib commented Jun 2, 2025

I guess something like this might need a [Pre-]RFC?, but as I'm #5418 (comment) with nested workspaces while using Cargo as part of a rustdoc-markdown toolchain, I figured I'd go ahead and take a stab at drafting a solution that could at least work for my use case, to get some practical insight into the problem.

Personally, I do not look at PRs until a design is agreed on. Having design discussions here splits the conversation and makes it harder to follow. This is also feature with a lot of potential workflow considerations and I would personally want to see a more deliberate approach, like an RFC,. to make sure we understand what problems we are trying to solve, how this has been solved before, and how this will impact people.

FYI there is also discussion at #7871 and this is somewhat related to rust-lang/rfcs#3279

Right, I at least like to write some code, where possible, as a proof of concept / point of reference before getting into design discussions to try and avoid getting too academic, but it makes sense to avoid fragmented design discussions here, beyond linking to existing discussions or perhaps giving pointers for framing a new RFC.

Thanks for the links - I hadn't come across those discussions from searching myself.

rib added 6 commits June 6, 2025 11:54
If a root directory is specified then cargo will only search ancestors
looking for `.cargo` config files and workspaces up until the root
directory is reached (instead of walking to the root of the filesystem).

A root can be specified in three ways:

1. The existence of a `.cargo/root` file (discovered by checking parents
   up towards the root of the filesystem)
2. By setting the `CARGO_ROOT` environment variable
3. By passing `--root` on the command line

If more than one is specified then the effective root is the one that's
most-specific / closest to the current working directory.

### What does this PR try to resolve?

Fixes: rust-lang#5418
(for my use case then rust-lang#6805 isn't a practical workaround)

This goes some way to allow nesting of workspaces, by adding a way to
limit the directories that Cargo searches when looking for config files
and workspace manifests.

This does not enable nesting in the form of workspace inheritance - but
does provide the means to technically have a filesystem with nested
workspaces (that aren't aware of each other) and be able to hide any
outer (unrelated) workspace while building a nested workspace.

This gives more flexibility for tools that want to use cargo as an
implementation detail. In particular this allows you to sandbox the
build of nested third-party workspaces that may be (unknowingly)
dynamically unpacked within an outer workspace, in situations where
neither the workspace being built and the outer workspace are owned by
the tool that is managing the build.

For example a tool based on rustdoc-json should be able to fetch and
build documentation for third-party crates under any user-specified
build/target directory without having to worry about spooky action at a
distance due to config files and workspaces in ancestor directories.

In my case, I have a runtime for coding with LLMs that is given a repo
to work on and is expected to keep its artifacts contained to a `.ai/`
directory. This runtime supports building markdown documentation for
Rust crates, which involves using cargo to generate rustdoc-json data.
That tool is expected to keep its artifacts contained within
`.ai/docs/rust/build/`. It's possible that the project itself is Rust
based and could define a workspace or `.cargo/config.toml` but from the
pov of this toolchain those have nothing to do with the crate whose
documentation is being generated (which are packages downloaded from
crates.io).

### How to test and review this PR?

TODO: write tests
This generalizes `CARGO_ROOTS` so it can accept a list of directories,
similar to `GIT_CEILING_DIRECTORIES`.

This way a default configuration could come from a user's `.bashrc` and
could be extended later by specific build tools.

Similarly this also makes it so `--root` can be passed repeatedly.
It's no longer assumed that we can resolve a single stop_at_root path
for both finding config files and manifests.

GlobalContext now maintains a sorted list of user-configured roots and
caches two resolved SearchRoutes; one for loading config files, the
other for manifests (both automatically invalidated if roots are added
or start paths change).

A `SearchRoute` represents a starting directory and the closest
root directory, where parent traversal will stop.

If searching starts from outside of all configured roots then no
ancestor traversal is allowed.

 - This is a safety measure to reduce the risk of loading unsafe config
   files that aren't owned by the user (e.g. under `c:/.cargo` on
   Windows or `/tmp` on Linux)

By default the user's home directory is added as a root if CARGO_ROOTS
is not set.

 - Combined with the rule above, this means that Cargo has safe defaults
   and will not load configs outside of your home directory, unless you
   explicitly set CARGO_ROOTS.

 - This is also a measure to avoid triggering home directory
   automounters

TODO: there still needs to be a special case for loading a project
manifest, before searching for a workspace. Cargo should be allowed to
walk ancestors outside of all configured roots, when loading the first
Cargo.toml. In this case the directory of the manifest should
immediately be set as a root. This would be a safety / ergonomics
trade-off to allow building of crates under `/tmp/package/nested/dir`
but still disallow attempts to load `/tmp/Cargo.toml` as a workspace.

TODO: Cargo should log a warning when searching for files outside of
any root, so it's easy to see why ancestor config files or manifests
are not being loaded.
As a special-case; unlike for configs and workspaces, this allows Cargo
to search all the way up to the root of the filesystem when loading a
package manifest, IFF Cargo is running outside of all configured root
directories.

This exception doesn't apply if Cargo is running within a configured
root directory.

This is a trade off between safety and convenience (+ backwards
compatibility) that ensures it is possible to unpack a package outside
of your home directory (such as `/tmp/my-package`) and then start a
build from a subdirectory like `/tmp/my-package/sub/dir` such that Cargo
will traverse parent directories looking for
`/tmp/my-package/Cargo.toml` but it will not try and load untrusted
configs under `/tmp/.cargo` or workspace manifests under
`/tmp/Cargo.toml` that could have been created by another user.

FIXME: If the first manifest loaded is in fact a workspace manifest
we need to make sure we add that as a root directory so that when
nested packages within the workspace are built they are able to find
their way back to the top of the workspace. Otherwise each nested
package build will be limited to the directory of the package.
To allow a package to be unpacked outside of all root directories, such
as `/tmp/my-package` and allow a build to start from
`/tmp/my-package/sub/dir`, Cargo is allowed to traverse parent
directories until it finds its first manifest, and this then becomes a
fallback root directory that will stop Cargo from looking any higher for
a workspace (which could attempt to load `/tmp/Cargo.toml` that may not
be safe).

XXX: Instead of having a special case for the first manifest; maybe it could
be better to instead _always_ add the directory of a manifest as a root
directory if it's not a subdirectory of any existing root.

That would have a mostly-equivalent result to the current behaviour but
_might_ be simpler to document?
To help make it clear why Cargo may not load all ancestor configs or
find workspace manifests in ancestor directories, Cargo emits a warning:

```
warning: Cargo is running outside of any root directory, limiting
loading of ancestor configs and manifest
```
@rib rib force-pushed the rib/pr/root-dir-search-limit branch from 7ea749b to d3167ed Compare June 6, 2025 11:31
@rib rib changed the title [Pre-RFC] Enable .cargo/root, CARGO_ROOT + --root to limit config + workspace searches [Pre-RFC] Enable .cargo/root, CARGO_ROOTS + --root to limit config + manifest searches Jun 6, 2025
@rib
Copy link
Author

rib commented Jun 6, 2025

While I've been looking at some of the other related issues in #7871 (including issues with /tmp/Config.toml or /tmp/.cargo/Config.toml files or triggering home directory automounters) and the safety issues discussed in rust-lang/rfcs#3279 I've taken another pass at this prototype and updated the description accordingly.

I'll try and find some time to follow up on the discussions for those separately, but I'm currently optimistic that something close to this draft could be a viable solution for several of these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cli Area: Command-line interface, option parsing, etc. A-configuration Area: cargo config files and env vars A-documenting-cargo-itself Area: Cargo's documentation A-filesystem Area: issues with filesystems A-interacts-with-crates.io Area: interaction with registries A-testing-cargo-itself Area: cargo's tests A-workspaces Area: workspaces Command-locate-project Command-new Command-owner Command-yank
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Put Searching Parent Directories for Cargo.toml Behind Logical Gate override "current package believes it's in a workspace"
4 participants