Skip to content

RFC: Implement a sandbox for environment variables and files #2391

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
289 changes: 289 additions & 0 deletions text/0000-env-sandboxing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
- Feature Name: env-sandboxing
- Start Date: 2018-04-05
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Allow `rustc` to be invoked with constraints on which environment variables may
be queried, and which files may be included.

# Motivation
[motivation]: #motivation

This RFC introduces some simple sandboxing for process environment variables and
for include files (collectively "system environment").

This is primarily to allow a build system to more precisely control the inputs
to `rustc` which may affect the generated output. Rust has two mechanisms by
which an input source can access ambient properties of `rustc`'s system
environment: `env!()` and `option_env!()` for reading environment variables, and
`include!()`/`include_str!()`/`include_bytes!()` for reading arbitrary files.

## Environment Variables
`rustc` allows source code to directly access its process environment variables
via the `env!()` and `option_env!()` (pseudo-)macros. There is currently no way
to control which environment variables are accessible, or what their apparent
contents are.

This poses a few problems:

- The build system has no idea what variables the code is using for the purposes
of tracking its inputs
- Code can read arbitrary contents from arbitrary variables, which may pose
unwanted information leakage (from build environment to final deployment
environment, for example).
- It combines the environment `rustc` needs for its own operation (PATH and
LD_LIBRARY_PATH, for example) with environment the compiled code might want,
and doesn't allow them to be set independently.

## Include Files
Rust allows any file to be directly included to be logically part of the
including source file, either as more Rust source code (include!()), a text
string (include_str!()) or arbitrary binary data (include_bytes!()). These
macros take a raw string which is used as an arbitrary path which may be
absolute - they therefore allow any file that `rustc` has permission to access
to be used in the compiled output.

(This differs from separating a crate into multiple source files via modules, as
those files are always relative to the top-level lib.rs/main.rs source.)

This causes a couple of problems:

- The build system can't know or constrain what files are actually inputs to the
compiler.
- The code can't unintentionally leak state from the build environment to the
deployment environment.

## Note
This RFC specifically does not intend to address any actual security concerns.
There are many other avenues that it does not attempt to control, such as
compiler plugins/proc macros. However, it does help with unintentional problems
which could result in later security problems if unaddressed. Blocking access to
files or environment variables also makes it easier to audit what environment
accesses a codebase is attempting to make.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

## Environment Variables

Rust supports the `env!()` and `option_env!()` macros to allow the value of a
process environment variable to be logically pasted into the source as a string
(or `Option` string).

`Rustc` has the following command-line options to allow the access to
Copy link
Contributor

@fbstj fbstj Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capitalised rustc

environment variables to be controlled at a fine-grain level:

- `--env-clear` - completely empty the logical environment visible to
`env!()`/`option_env!()`, causing them to all fail/return `None`. Without any
other options, this will completely disable environment variable access.
- `--env-allow NAME` - allow a specific environment variable to be read from the
process environment.
- `--env-define NAME=VAL` - define a logical environment variable. This does not
need to be present in the actual process environment, or if it is, its value
is overridden.

These are processed in the order:
1. `--env-clear`
2. `--env-allow NAME`
3. `--env-set NAME=VAL`
Copy link
Contributor

@fbstj fbstj Apr 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit of this (rustc --env-set NAME=value) over NAME=value rustc --env-allow NAME other than brevity? is it specifically to stop propagation into sub-processes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several reasons:

  • You can use it to override the value of things that rustc (or the linker) itself uses, like PATH or LD_LIBRARY_PATH (among others)
  • You can just change rustc's command-line args, whereas manipulating the environment might need deeper changes to whatever's invoking rustc (possibly system-dependent changes)
  • Non-propagation isn't something that had occurred to me, but it is related to my first point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my question was more that my understanding is that running > NAME=value rustc --env-allow NAME is only overriding the value for that instantiation of rustc, so I don't understand what your first point is specifying? are you meaning that some point before/around your 'sandboxed' area is able to access the environment but then the majority is going to use the --env-allow value?

your option may also be more sensible for any shells/clients which don't support 'instantiation-local environment variables', but I think the RFC could do with more discussion of these points.

thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that when rustc itself is - say - invoking the linker, it may use $PATH to find the executable. If we want env!("PATH") to work at all, but not necessarily return the PATH that's been set up for rustc to be able to find its linker, then we need some way of allowing env!("PATH") and rustc's own runtime call to std::env::var("PATH") to be able to return distinct values.


where options processed later override earlier ones.

By default, the environment is completely open, leaving the existing behaviour
unchanged. Once any of the options above is specified, accesses to environment
variable becomes controlled accordingly. Note that `--env-clear` is only necessary
if you want a completely empty environment.

## Include Paths

Rust allows arbitrary files to be included into a source file, with one of
`include!()` (include Rust source), `include_str!()` (include a file as a
literal text string), or `include_bytes!()` (include a file as arbitrary bytes).

`rustc` maintains a set of "allowed path prefixes". When it tries to open a file
for an include, it checks the path against a set of allowed prefixes, and only
allows the operation if it matches. For example if we set: `--include-prefix
foo/bar`, then `include!("foo/bar/blat.rs")` will be allowed, but
`include!("bar/bar/blacksheep.rs")` will fail, regardless of whether the file
actually exists or not.

`rustc` has the following command-line options to control the valid prefixes:
- `--clear-include-prefixes` - clear all allowable prefixes, effectively
disabling all `include*!()` macros unless a new valid prefix is added.
- `--include-prefix PATH` - add PATH to the set of valid prefixes. All included
paths must match one of the valid path prefixes before it can be opened.

By default, all path prefixes are valid, leaving the current behaviour
unchanged. They are only constrained once one of the options above are
specified.

All paths are canonicalized before matching (turned into an absolute path, all
symlinks followed, etc), so they must exist at the time they're specified. For
example, this means that a path of `foo/../bar` and `bar` are considered
equivalent (assuming that `foo` exists).

Note that a "path prefix" can be an entire pathname, allowing these options to
explicitly specify which individual files may be included.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Conceptually, the sandbox state is held as `Option<Filter>` (where filter is
either an environment or path prefix filter). If it has a `None` value, then it
is all-open, which is equivalent to the current state.

Once any filter rule is put in place, it becomes `Some(Filter)`, and the filter
rules are applied to the access as appropriate.

For environment variables, the variable name must exactly match one of the
existing rules, which either gives a specific value to return or indicates
that it should get the value from the process environment variables.

For paths, if a path access has a prefix which matches any in the allowed
prefixes rule, then the access is allowed.

## Implementation

This feature is implemented in a separate `env-sandbox` crate, which manages the
state for both environment variables and path prefixes. It is used in two
places:

- during command-line option parsing, the sandbox state is built up, and stored
in the global config structure.
- it is added to `ParseSess`, so that it's available to the various `expand_`
functions called from libsyntax_ext.

Specifically, `libsyntax_ext::env`'s `expand_option_env` and `expand_env` call
`EnvSandbox::env_get()` to get an environment variable rather than directly
calling `std::env::var()` so that the sandbox can mediate access.

Likewise, `libsyntax::ext::source_util`s `expand_include` uses
`EnvSandbox::path_lookup()` to get a path that either matches one of the
allowable prefixes, or returns an error. Likewise `expand_include_str` and
`expand_bytes` use `EnvSandbox::path_open()` with either successfully returns an
open file, or returns an error. In both cases, the error is `PermissionDenied`
with a description "path does not have a valid prefix".

In all cases, failing to fetch an environment variable or failing to open a file
should behave the same as if the variable really doesn't exist, or if the file
can't be opened for system reasons (non-existence, permission denied, etc).

Implementation is in PR https://github.com/rust-lang/rust/pull/49387.

# Drawbacks
[drawbacks]: #drawbacks

It adds numerous new command line options which ultimately be of limited
usefulness, while adding additional complexity and maintenance burden.

Users may be confused by the errors produced by using these features if they
didn't realize their build system was making use of them.

# Rationale and alternatives
[alternatives]: #alternatives

This design allows a build system to control `rustc`'s effective system
environment at a fine grain from invocation to invocation (aka hermetic
builds).

In Unix systems, its possible to control the environment directly, either via
the parameters passed to the `execve()` call invoking `rustc`, or using commands
such as `env`. These will allow any environment variable to be removed or
changed, but they can't be used to control the effective content of variables
that `rustc` itself (or any tool it invokes, like the linker) uses as part of
its own execution, such as `PATH` or `LD_LIBRARY_PATH`.

I don't know how one controls the environment under Windows.

The `--env-*` options give more control in a consistent way, and don't require
build tools to invoke `rustc` differently aside from adding new command-line
options. They allow the environment to be controlled in a cross-platform way.

Controlling file access is much more complex. The only way to control file
access in a similar way to the `--include-prefix` option is to build a
filesystem sandbox, and run the compiler within a chroot jail. This is a costly
operation which requires root-level permissions; the details of it are also very
system-specific. It also doesn't really solve the problem as it doesn't allow
particularly fine-grained control over what files can be included (any file
that's required to be present to make a runnable `rustc` instance is also
available for including).

# Prior art
[prior-art]: #prior-art

I don't know of other compiled languages which allow direct access to
environment variables (C and C++ don't), so there's no analogous requirement to
control access.

Many build systems have problems sandboxing their compilers. The
[Bazel](https://bazel.build/) and [Buck](https://buckbuild.com/) build tools
allow a limited form of sandbox by symlinking all the required source files (as
nominated by the dependency rules) into a special build directory, so that the
build fails if the compiler accesses files which haven't been symlinked.
However, this does not prevent access to other files outside that build
directory, via absolute path.

The Bazel developers have proposed a user-level filesystem,
[sandboxfs](https://blog.bazel.build/2017/08/25/introducing-sandboxfs.html), to
maintain a lightweight sandbox environment for an instance of a build. In
addition to being complex, system-specific and perhaps a performance hit (all
filesystem access is mediated by a user-mode filesystem), it has the same
disadvantage as other system-level sandboxing/container approaches, where the
entire compiler environment is still available for including.

There is a [cargo issue](https://github.com/rust-lang/cargo/issues/5282)
proposing a way to control the environment of a build script. This RFC provides
a mechanism to implement that.

# Unresolved questions
[unresolved]: #unresolved-questions

The use of the term "environment" is confusing - it refers to the general
concept of the system environment in which the rust compiler runs, but a
specific part of that are the "environment variables", which is often
abbreviated to "the environment". I had considered calling the over-arching
concept a plain "sandbox", but I was concerned that there might be other
potential uses for that term in the context of `rustc`.

The design is focused on a specific set of problems I have encountered while
integrating `rustc` into an existing build environment, but I think it's a
general enough problem and implementation that it's worth putting in rustc. But
I'm open to making changes if they make the feature more generally useful.

I think the use of canonicalized paths may need some more thought.

## Environment

- Is matching by literal name powerful enough? Is there a requirement for some
kind of pattern matching? (My view is that this can be added via a new
command-line option, and the implementation of `EnvSandbox` can be extended;
it wouldn't require any other changes.)
- Transformations applied to returned values? Ie, the ability to apply some kind
of transformation/normalization to a variable? Seems too complex and out of
scope for this change.
- Should invocations to `option_env!()` which access blocked variables print a
warning?
- Should invocations of `env!()` which access blocked variables explicitly
indicate its because of a command-line option, rather than because its
actually doesn't exist?
- Does there need to be a way to undo `--env-clear`, to re-allow access to
the environment?
- Does there need to be a mode where `--env-allow` defines/overrides variables
but does not block access to the rest of the environment?

## File Prefixes

- Is path prefix good enough? Should it be something like regex? I avoided this
because I didn't know how to handle path canonicalization, and because it is
more system-dependent.
- Should path matching be done on the basis of canonicalized paths? I though
that requiring each file to be accessed via a specific form of path, and
accessing it via other paths would be too brittle and confusing, and that
doing everything on a canonical basis is more understandable. But there are
still issues of how to handle a symlink in the middle of a path.

Hm, canonicalized paths won't work as expected if the source files themselves
are symlinks to elsewhere.