rust-lang · jsgf · Apr 5, 2018 · Apr 6, 2018 · fbstj · Apr 6, 2018
diff --git a/text/0000-env-sandboxing.md b/text/0000-env-sandboxing.md
@@ -0,0 +1,289 @@
+- Feature Name: env-sandboxing
+- Start Date: 2018-04-05
+- RFC PR: (leave this empty)
+- Rust Issue: (leave this empty)
+
+# Summary
+[summary]: #summary
+
+Allow `rustc` to be invoked with constraints on which environment variables may
+be queried, and which files may be included.
+
+# Motivation
+[motivation]: #motivation
+
+This RFC introduces some simple sandboxing for process environment variables and
+for include files (collectively "system environment").
+
+This is primarily to allow a build system to more precisely control the inputs
+to `rustc` which may affect the generated output. Rust has two mechanisms by
+which an input source can access ambient properties of `rustc`'s system
+environment: `env!()` and `option_env!()` for reading environment variables, and
+`include!()`/`include_str!()`/`include_bytes!()` for reading arbitrary files.
+
+## Environment Variables
+`rustc` allows source code to directly access its process environment variables
+via the `env!()` and `option_env!()` (pseudo-)macros. There is currently no way
+to control which environment variables are accessible, or what their apparent
+contents are.
+
+This poses a few problems:
+
+- The build system has no idea what variables the code is using for the purposes
+  of tracking its inputs
+- Code can read arbitrary contents from arbitrary variables, which may pose
+  unwanted information leakage (from build environment to final deployment
+  environment, for example).
+- It combines the environment `rustc` needs for its own operation (PATH and
+  LD_LIBRARY_PATH, for example) with environment the compiled code might want,
+  and doesn't allow them to be set independently.
+
+## Include Files
+Rust allows any file to be directly included to be logically part of the
+including source file, either as more Rust source code (include!()), a text
+string (include_str!()) or arbitrary binary data (include_bytes!()). These
+macros take a raw string which is used as an arbitrary path which may be
+absolute - they therefore allow any file that `rustc` has permission to access
+to be used in the compiled output.
+
+(This differs from separating a crate into multiple source files via modules, as
+those files are always relative to the top-level lib.rs/main.rs source.)
+
+This causes a couple of problems:
+
+- The build system can't know or constrain what files are actually inputs to the
+  compiler.
+- The code can't unintentionally leak state from the build environment to the
+  deployment environment.
+
+## Note
+This RFC specifically does not intend to address any actual security concerns.
+There are many other avenues that it does not attempt to control, such as
+compiler plugins/proc macros. However, it does help with unintentional problems
+which could result in later security problems if unaddressed. Blocking access to
+files or environment variables also makes it easier to audit what environment
+accesses a codebase is attempting to make.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+## Environment Variables
+
+Rust supports the `env!()` and `option_env!()` macros to allow the value of a
+process environment variable to be logically pasted into the source as a string
+(or `Option` string).
+
+`Rustc` has the following command-line options to allow the access to
+environment variables to be controlled at a fine-grain level:
+
+- `--env-clear` - completely empty the logical environment visible to
+   `env!()`/`option_env!()`, causing them to all fail/return `None`. Without any
+   other options, this will completely disable environment variable access.
+- `--env-allow NAME` - allow a specific environment variable to be read from the
+  process environment.
+- `--env-define NAME=VAL` - define a logical environment variable. This does not
+  need to be present in the actual process environment, or if it is, its value
+  is overridden.
+
+These are processed in the order:
+1. `--env-clear`
+2. `--env-allow NAME`
+3. `--env-set NAME=VAL`
+
+where options processed later override earlier ones.
+
+By default, the environment is completely open, leaving the existing behaviour
+unchanged. Once any of the options above is specified, accesses to environment
+variable becomes controlled accordingly. Note that `--env-clear` is only necessary
+if you want a completely empty environment.
+
+## Include Paths
+
+Rust allows arbitrary files to be included into a source file, with one of
+`include!()` (include Rust source), `include_str!()` (include a file as a
+literal text string), or `include_bytes!()` (include a file as arbitrary bytes).
+
+`rustc` maintains a set of "allowed path prefixes". When it tries to open a file
+for an include, it checks the path against a set of allowed prefixes, and only
+allows the operation if it matches. For example if we set: `--include-prefix
+foo/bar`, then `include!("foo/bar/blat.rs")` will be allowed, but
+`include!("bar/bar/blacksheep.rs")` will fail, regardless of whether the file
+actually exists or not.
+
+`rustc` has the following command-line options to control the valid prefixes:
+- `--clear-include-prefixes` - clear all allowable prefixes, effectively
+  disabling all `include*!()` macros unless a new valid prefix is added.
+- `--include-prefix PATH` - add PATH to the set of valid prefixes. All included
+  paths must match one of the valid path prefixes before it can be opened.
+
+By default, all path prefixes are valid, leaving the current behaviour
+unchanged. They are only constrained once one of the options above are
+specified.
+
+All paths are canonicalized before matching (turned into an absolute path, all
+symlinks followed, etc), so they must exist at the time they're specified. For
+example, this means that a path of `foo/../bar` and `bar` are considered
+equivalent (assuming that `foo` exists).
+
+Note that a "path prefix" can be an entire pathname, allowing these options to
+explicitly specify which individual files may be included.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+Conceptually, the sandbox state is held as `Option<Filter>` (where filter is
+either an environment or path prefix filter). If it has a `None` value, then it
+is all-open, which is equivalent to the current state.
+
+Once any filter rule is put in place, it becomes `Some(Filter)`, and the filter
+rules are applied to the access as appropriate.
+
+For environment variables, the variable name must exactly match one of the
+existing rules, which either gives a specific value to return or indicates
+that it should get the value from the process environment variables.
+
+For paths, if a path access has a prefix which matches any in the allowed
+prefixes rule, then the access is allowed.
+
+## Implementation
+
+This feature is implemented in a separate `env-sandbox` crate, which manages the
+state for both environment variables and path prefixes. It is used in two
+places:
+
+- during command-line option parsing, the sandbox state is built up, and stored
+  in the global config structure.
+- it is added to `ParseSess`, so that it's available to the various `expand_`
+  functions called from libsyntax_ext.
+
+Specifically, `libsyntax_ext::env`'s `expand_option_env` and `expand_env` call
+`EnvSandbox::env_get()` to get an environment variable rather than directly
+calling `std::env::var()` so that the sandbox can mediate access.
+
+Likewise, `libsyntax::ext::source_util`s `expand_include` uses
+`EnvSandbox::path_lookup()` to get a path that either matches one of the
+allowable prefixes, or returns an error. Likewise `expand_include_str` and
+`expand_bytes` use `EnvSandbox::path_open()` with either successfully returns an
+open file, or returns an error. In both cases, the error is `PermissionDenied`
+with a description "path does not have a valid prefix".
+
+In all cases, failing to fetch an environment variable or failing to open a file
+should behave the same as if the variable really doesn't exist, or if the file
+can't be opened for system reasons (non-existence, permission denied, etc).
+
+Implementation is in PR https://github.com/rust-lang/rust/pull/49387.
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+It adds numerous new command line options which ultimately be of limited
+usefulness, while adding additional complexity and maintenance burden.
+
+Users may be confused by the errors produced by using these features if they
+didn't realize their build system was making use of them.
+
+# Rationale and alternatives
+[alternatives]: #alternatives
+
+This design allows a build system to control `rustc`'s effective system
+environment at a fine grain from invocation to invocation (aka hermetic
+builds).
+
+In Unix systems, its possible to control the environment directly, either via
+the parameters passed to the `execve()` call invoking `rustc`, or using commands
+such as `env`. These will allow any environment variable to be removed or
+changed, but they can't be used to control the effective content of variables
+that `rustc` itself (or any tool it invokes, like the linker) uses as part of
+its own execution, such as `PATH` or `LD_LIBRARY_PATH`.
+
+I don't know how one controls the environment under Windows.
+
+The `--env-*` options give more control in a consistent way, and don't require
+build tools to invoke `rustc` differently aside from adding new command-line
+options. They allow the environment to be controlled in a cross-platform way.
+
+Controlling file access is much more complex. The only way to control file
+access in a similar way to the `--include-prefix` option is to build a
+filesystem sandbox, and run the compiler within a chroot jail. This is a costly
+operation which requires root-level permissions; the details of it are also very
+system-specific. It also doesn't really solve the problem as it doesn't allow
+particularly fine-grained control over what files can be included (any file
+that's required to be present to make a runnable `rustc` instance is also
+available for including).
+
+# Prior art
+[prior-art]: #prior-art
+
+I don't know of other compiled languages which allow direct access to
+environment variables (C and C++ don't), so there's no analogous requirement to
+control access.
+
+Many build systems have problems sandboxing their compilers. The
+[Bazel](https://bazel.build/) and [Buck](https://buckbuild.com/) build tools
+allow a limited form of sandbox by symlinking all the required source files (as
+nominated by the dependency rules) into a special build directory, so that the
+build fails if the compiler accesses files which haven't been symlinked.
+However, this does not prevent access to other files outside that build
+directory, via absolute path.
+
+The Bazel developers have proposed a user-level filesystem,
+[sandboxfs](https://blog.bazel.build/2017/08/25/introducing-sandboxfs.html), to
+maintain a lightweight sandbox environment for an instance of a build. In
+addition to being complex, system-specific and perhaps a performance hit (all
+filesystem access is mediated by a user-mode filesystem), it has the same
+disadvantage as other system-level sandboxing/container approaches, where the
+entire compiler environment is still available for including.
+
+There is a [cargo issue](https://github.com/rust-lang/cargo/issues/5282)
+proposing a way to control the environment of a build script. This RFC provides
+a mechanism to implement that.
+
+# Unresolved questions
+[unresolved]: #unresolved-questions
+
+The use of the term "environment" is confusing - it refers to the general
+concept of the system environment in which the rust compiler runs, but a
+specific part of that are the "environment variables", which is often
+abbreviated to "the environment". I had considered calling the over-arching
+concept a plain "sandbox", but I was concerned that there might be other
+potential uses for that term in the context of `rustc`.
+
+The design is focused on a specific set of problems I have encountered while
+integrating `rustc` into an existing build environment, but I think it's a
+general enough problem and implementation that it's worth putting in rustc. But
+I'm open to making changes if they make the feature more generally useful.
+
+I think the use of canonicalized paths may need some more thought.
+
+## Environment
+
+- Is matching by literal name powerful enough? Is there a requirement for some
+  kind of pattern matching? (My view is that this can be added via a new
+  command-line option, and the implementation of `EnvSandbox` can be extended;
+  it wouldn't require any other changes.)
+- Transformations applied to returned values? Ie, the ability to apply some kind
+  of transformation/normalization to a variable? Seems too complex and out of
+  scope for this change.
+- Should invocations to `option_env!()` which access blocked variables print a
+  warning?
+- Should invocations of `env!()` which access blocked variables explicitly
+  indicate its because of a command-line option, rather than because its
+  actually doesn't exist?
+- Does there need to be a way to undo `--env-clear`, to re-allow access to
+  the environment?
+- Does there need to be a mode where `--env-allow` defines/overrides variables
+  but does not block access to the rest of the environment?
+
+## File Prefixes
+
+- Is path prefix good enough? Should it be something like regex? I avoided this
+  because I didn't know how to handle path canonicalization, and because it is
+  more system-dependent.
+- Should path matching be done on the basis of canonicalized paths? I though
+  that requiring each file to be accessed via a specific form of path, and
+  accessing it via other paths would be too brittle and confusing, and that
+  doing everything on a canonical basis is more understandable. But there are
+  still issues of how to handle a symlink in the middle of a path.
+
+  Hm, canonicalized paths won't work as expected if the source files themselves
+  are symlinks to elsewhere.