-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: Implement a sandbox for environment variables and files #2391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,289 @@ | ||
- Feature Name: env-sandboxing | ||
- Start Date: 2018-04-05 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Allow `rustc` to be invoked with constraints on which environment variables may | ||
be queried, and which files may be included. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
This RFC introduces some simple sandboxing for process environment variables and | ||
for include files (collectively "system environment"). | ||
|
||
This is primarily to allow a build system to more precisely control the inputs | ||
to `rustc` which may affect the generated output. Rust has two mechanisms by | ||
which an input source can access ambient properties of `rustc`'s system | ||
environment: `env!()` and `option_env!()` for reading environment variables, and | ||
`include!()`/`include_str!()`/`include_bytes!()` for reading arbitrary files. | ||
|
||
## Environment Variables | ||
`rustc` allows source code to directly access its process environment variables | ||
via the `env!()` and `option_env!()` (pseudo-)macros. There is currently no way | ||
to control which environment variables are accessible, or what their apparent | ||
contents are. | ||
|
||
This poses a few problems: | ||
|
||
- The build system has no idea what variables the code is using for the purposes | ||
of tracking its inputs | ||
- Code can read arbitrary contents from arbitrary variables, which may pose | ||
unwanted information leakage (from build environment to final deployment | ||
environment, for example). | ||
- It combines the environment `rustc` needs for its own operation (PATH and | ||
LD_LIBRARY_PATH, for example) with environment the compiled code might want, | ||
and doesn't allow them to be set independently. | ||
|
||
## Include Files | ||
Rust allows any file to be directly included to be logically part of the | ||
including source file, either as more Rust source code (include!()), a text | ||
string (include_str!()) or arbitrary binary data (include_bytes!()). These | ||
macros take a raw string which is used as an arbitrary path which may be | ||
absolute - they therefore allow any file that `rustc` has permission to access | ||
to be used in the compiled output. | ||
|
||
(This differs from separating a crate into multiple source files via modules, as | ||
those files are always relative to the top-level lib.rs/main.rs source.) | ||
|
||
This causes a couple of problems: | ||
|
||
- The build system can't know or constrain what files are actually inputs to the | ||
compiler. | ||
- The code can't unintentionally leak state from the build environment to the | ||
deployment environment. | ||
|
||
## Note | ||
This RFC specifically does not intend to address any actual security concerns. | ||
There are many other avenues that it does not attempt to control, such as | ||
compiler plugins/proc macros. However, it does help with unintentional problems | ||
which could result in later security problems if unaddressed. Blocking access to | ||
files or environment variables also makes it easier to audit what environment | ||
accesses a codebase is attempting to make. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
## Environment Variables | ||
|
||
Rust supports the `env!()` and `option_env!()` macros to allow the value of a | ||
process environment variable to be logically pasted into the source as a string | ||
(or `Option` string). | ||
|
||
`Rustc` has the following command-line options to allow the access to | ||
environment variables to be controlled at a fine-grain level: | ||
|
||
- `--env-clear` - completely empty the logical environment visible to | ||
`env!()`/`option_env!()`, causing them to all fail/return `None`. Without any | ||
other options, this will completely disable environment variable access. | ||
- `--env-allow NAME` - allow a specific environment variable to be read from the | ||
process environment. | ||
- `--env-define NAME=VAL` - define a logical environment variable. This does not | ||
need to be present in the actual process environment, or if it is, its value | ||
is overridden. | ||
|
||
These are processed in the order: | ||
1. `--env-clear` | ||
2. `--env-allow NAME` | ||
3. `--env-set NAME=VAL` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the benefit of this ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Several reasons:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. my question was more that my understanding is that running your option may also be more sensible for any shells/clients which don't support 'instantiation-local environment variables', but I think the RFC could do with more discussion of these points. thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean that when |
||
|
||
where options processed later override earlier ones. | ||
|
||
By default, the environment is completely open, leaving the existing behaviour | ||
unchanged. Once any of the options above is specified, accesses to environment | ||
variable becomes controlled accordingly. Note that `--env-clear` is only necessary | ||
if you want a completely empty environment. | ||
|
||
## Include Paths | ||
|
||
Rust allows arbitrary files to be included into a source file, with one of | ||
`include!()` (include Rust source), `include_str!()` (include a file as a | ||
literal text string), or `include_bytes!()` (include a file as arbitrary bytes). | ||
|
||
`rustc` maintains a set of "allowed path prefixes". When it tries to open a file | ||
for an include, it checks the path against a set of allowed prefixes, and only | ||
allows the operation if it matches. For example if we set: `--include-prefix | ||
foo/bar`, then `include!("foo/bar/blat.rs")` will be allowed, but | ||
`include!("bar/bar/blacksheep.rs")` will fail, regardless of whether the file | ||
actually exists or not. | ||
|
||
`rustc` has the following command-line options to control the valid prefixes: | ||
- `--clear-include-prefixes` - clear all allowable prefixes, effectively | ||
disabling all `include*!()` macros unless a new valid prefix is added. | ||
- `--include-prefix PATH` - add PATH to the set of valid prefixes. All included | ||
paths must match one of the valid path prefixes before it can be opened. | ||
|
||
By default, all path prefixes are valid, leaving the current behaviour | ||
unchanged. They are only constrained once one of the options above are | ||
specified. | ||
|
||
All paths are canonicalized before matching (turned into an absolute path, all | ||
symlinks followed, etc), so they must exist at the time they're specified. For | ||
example, this means that a path of `foo/../bar` and `bar` are considered | ||
equivalent (assuming that `foo` exists). | ||
|
||
Note that a "path prefix" can be an entire pathname, allowing these options to | ||
explicitly specify which individual files may be included. | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
Conceptually, the sandbox state is held as `Option<Filter>` (where filter is | ||
either an environment or path prefix filter). If it has a `None` value, then it | ||
is all-open, which is equivalent to the current state. | ||
|
||
Once any filter rule is put in place, it becomes `Some(Filter)`, and the filter | ||
rules are applied to the access as appropriate. | ||
|
||
For environment variables, the variable name must exactly match one of the | ||
existing rules, which either gives a specific value to return or indicates | ||
that it should get the value from the process environment variables. | ||
|
||
For paths, if a path access has a prefix which matches any in the allowed | ||
prefixes rule, then the access is allowed. | ||
|
||
## Implementation | ||
|
||
This feature is implemented in a separate `env-sandbox` crate, which manages the | ||
state for both environment variables and path prefixes. It is used in two | ||
places: | ||
|
||
- during command-line option parsing, the sandbox state is built up, and stored | ||
in the global config structure. | ||
- it is added to `ParseSess`, so that it's available to the various `expand_` | ||
functions called from libsyntax_ext. | ||
|
||
Specifically, `libsyntax_ext::env`'s `expand_option_env` and `expand_env` call | ||
`EnvSandbox::env_get()` to get an environment variable rather than directly | ||
calling `std::env::var()` so that the sandbox can mediate access. | ||
|
||
Likewise, `libsyntax::ext::source_util`s `expand_include` uses | ||
`EnvSandbox::path_lookup()` to get a path that either matches one of the | ||
allowable prefixes, or returns an error. Likewise `expand_include_str` and | ||
`expand_bytes` use `EnvSandbox::path_open()` with either successfully returns an | ||
open file, or returns an error. In both cases, the error is `PermissionDenied` | ||
with a description "path does not have a valid prefix". | ||
|
||
In all cases, failing to fetch an environment variable or failing to open a file | ||
should behave the same as if the variable really doesn't exist, or if the file | ||
can't be opened for system reasons (non-existence, permission denied, etc). | ||
|
||
Implementation is in PR https://github.com/rust-lang/rust/pull/49387. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
It adds numerous new command line options which ultimately be of limited | ||
usefulness, while adding additional complexity and maintenance burden. | ||
|
||
Users may be confused by the errors produced by using these features if they | ||
didn't realize their build system was making use of them. | ||
|
||
# Rationale and alternatives | ||
[alternatives]: #alternatives | ||
|
||
This design allows a build system to control `rustc`'s effective system | ||
environment at a fine grain from invocation to invocation (aka hermetic | ||
builds). | ||
|
||
In Unix systems, its possible to control the environment directly, either via | ||
the parameters passed to the `execve()` call invoking `rustc`, or using commands | ||
such as `env`. These will allow any environment variable to be removed or | ||
changed, but they can't be used to control the effective content of variables | ||
that `rustc` itself (or any tool it invokes, like the linker) uses as part of | ||
its own execution, such as `PATH` or `LD_LIBRARY_PATH`. | ||
|
||
I don't know how one controls the environment under Windows. | ||
|
||
The `--env-*` options give more control in a consistent way, and don't require | ||
build tools to invoke `rustc` differently aside from adding new command-line | ||
options. They allow the environment to be controlled in a cross-platform way. | ||
|
||
Controlling file access is much more complex. The only way to control file | ||
access in a similar way to the `--include-prefix` option is to build a | ||
filesystem sandbox, and run the compiler within a chroot jail. This is a costly | ||
operation which requires root-level permissions; the details of it are also very | ||
system-specific. It also doesn't really solve the problem as it doesn't allow | ||
particularly fine-grained control over what files can be included (any file | ||
that's required to be present to make a runnable `rustc` instance is also | ||
available for including). | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
I don't know of other compiled languages which allow direct access to | ||
environment variables (C and C++ don't), so there's no analogous requirement to | ||
control access. | ||
|
||
Many build systems have problems sandboxing their compilers. The | ||
[Bazel](https://bazel.build/) and [Buck](https://buckbuild.com/) build tools | ||
allow a limited form of sandbox by symlinking all the required source files (as | ||
nominated by the dependency rules) into a special build directory, so that the | ||
build fails if the compiler accesses files which haven't been symlinked. | ||
However, this does not prevent access to other files outside that build | ||
directory, via absolute path. | ||
|
||
The Bazel developers have proposed a user-level filesystem, | ||
[sandboxfs](https://blog.bazel.build/2017/08/25/introducing-sandboxfs.html), to | ||
maintain a lightweight sandbox environment for an instance of a build. In | ||
addition to being complex, system-specific and perhaps a performance hit (all | ||
filesystem access is mediated by a user-mode filesystem), it has the same | ||
disadvantage as other system-level sandboxing/container approaches, where the | ||
entire compiler environment is still available for including. | ||
|
||
There is a [cargo issue](https://github.com/rust-lang/cargo/issues/5282) | ||
proposing a way to control the environment of a build script. This RFC provides | ||
a mechanism to implement that. | ||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
The use of the term "environment" is confusing - it refers to the general | ||
concept of the system environment in which the rust compiler runs, but a | ||
specific part of that are the "environment variables", which is often | ||
abbreviated to "the environment". I had considered calling the over-arching | ||
concept a plain "sandbox", but I was concerned that there might be other | ||
potential uses for that term in the context of `rustc`. | ||
|
||
The design is focused on a specific set of problems I have encountered while | ||
integrating `rustc` into an existing build environment, but I think it's a | ||
general enough problem and implementation that it's worth putting in rustc. But | ||
I'm open to making changes if they make the feature more generally useful. | ||
|
||
I think the use of canonicalized paths may need some more thought. | ||
|
||
## Environment | ||
|
||
- Is matching by literal name powerful enough? Is there a requirement for some | ||
kind of pattern matching? (My view is that this can be added via a new | ||
command-line option, and the implementation of `EnvSandbox` can be extended; | ||
it wouldn't require any other changes.) | ||
- Transformations applied to returned values? Ie, the ability to apply some kind | ||
of transformation/normalization to a variable? Seems too complex and out of | ||
scope for this change. | ||
- Should invocations to `option_env!()` which access blocked variables print a | ||
warning? | ||
- Should invocations of `env!()` which access blocked variables explicitly | ||
indicate its because of a command-line option, rather than because its | ||
actually doesn't exist? | ||
- Does there need to be a way to undo `--env-clear`, to re-allow access to | ||
the environment? | ||
- Does there need to be a mode where `--env-allow` defines/overrides variables | ||
but does not block access to the rest of the environment? | ||
|
||
## File Prefixes | ||
|
||
- Is path prefix good enough? Should it be something like regex? I avoided this | ||
because I didn't know how to handle path canonicalization, and because it is | ||
more system-dependent. | ||
- Should path matching be done on the basis of canonicalized paths? I though | ||
that requiring each file to be accessed via a specific form of path, and | ||
accessing it via other paths would be too brittle and confusing, and that | ||
doing everything on a canonical basis is more understandable. But there are | ||
still issues of how to handle a symlink in the middle of a path. | ||
|
||
Hm, canonicalized paths won't work as expected if the source files themselves | ||
are symlinks to elsewhere. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalised
rustc