-
Notifications
You must be signed in to change notification settings - Fork 1.6k
RFC: proc macro include!
#3200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
RFC: proc macro include!
#3200
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,205 @@ | ||
- Feature Name: `proc_macro_include` | ||
- Start Date: 2021-11-24 | ||
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) | ||
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
Proc macros can now effectively `include!` other files and process their contents. | ||
This both allows proc macros to communicate that they read external files, | ||
and to maintain spans into the external file for more useful error messages. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
- `include!` and `include_str!` are no longer required to be compiler built-ins, | ||
and could be implemented as proc macros. | ||
- Help incremental builds and build determinism, by proc macros telling rustc which files they read. | ||
- Improve proc macro sandboxability and cacheability, by offering a way to implement this class of | ||
file-reading macros without using OS APIs directly. | ||
|
||
# Guide-level explanation | ||
[guide-level-explanation]: #guide-level-explanation | ||
|
||
## For users of proc macros | ||
|
||
Nothing changes! You'll just see nicer errors and fewer rebuilds | ||
from procedural macros which read external files. | ||
|
||
## For writers of proc macros | ||
|
||
Three new functions are provided in the `proc_macro` interface crate: | ||
|
||
```rust | ||
/// Read the contents of a file as a `TokenStream` and add it to build dependency graph. | ||
/// | ||
/// The build system executing the compiler will know that the file was accessed during compilation, | ||
/// and will be able to rerun the build when the contents of the file changes. | ||
/// | ||
/// May fail for a number of reasons, for example, if the string contains unbalanced delimiters | ||
/// or characters not existing in the language. | ||
/// | ||
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may | ||
/// gracefully handle the missing file, or emit a compile error noting the missing dependency. | ||
/// | ||
/// Source spans are constructed for the read file. If you use the spans of this token stream, | ||
/// any resulting errors will correctly point at the tokens in the read file. | ||
/// | ||
/// NOTE: some errors may cause panics instead of returning `io::Error`. | ||
/// We reserve the right to change these errors into `io::Error`s later. | ||
fn include<P: AsRef<str>>(path: P) -> Result<TokenStream, std::io::Error>; | ||
CAD97 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
/// Read the contents of a file as a string literal and add it to build dependency graph. | ||
/// | ||
/// The build system executing the compiler will know that the file was accessed during compilation, | ||
/// and will be able to rerun the build when the contents of the file changes. | ||
/// | ||
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may | ||
/// gracefully handle the missing file, or emit a compile error noting the missing dependency. | ||
/// | ||
/// NOTE: some errors may cause panics instead of returning `io::Error`. | ||
/// We reserve the right to change these errors into `io::Error`s later. | ||
fn include_str<P: AsRef<str>>(path: P) -> Result<Literal, std::io::Error>; | ||
|
||
/// Read the contents of a file as raw bytes and add it to build dependency graph. | ||
/// | ||
/// The build system executing the compiler will know that the file was accessed during compilation, | ||
/// and will be able to rerun the build when the contents of the file changes. | ||
/// | ||
/// If the file fails to be read, this is not automatically a fatal error. The proc macro may | ||
/// gracefully handle the missing file, or emit a compile error noting the missing dependency. | ||
/// | ||
/// NOTE: some errors may cause panics instead of returning `io::Error`. | ||
/// We reserve the right to change these errors into `io::Error`s later. | ||
fn include_bytes<P: AsRef<str>>(path: P) -> Result<Vec<u8>, std::io::Error>; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it should work because There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, yeah, I overlooked that possibility. The main limitation is that the only current interface for getting the contents out of a It's probably not good to short term require debug escaping a binary file to reparse the byte string literal if a proc macro is going to post process the file... but if it's just including the literal, it can put the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The one limitation which needs to be solved is how do spans work. Do we just say that the byte string literal contains the raw bytes of the file (even though that would be illegal in a normal byte string, and invalid UTF-8), maybe as a new "kind" of byte string, so span offsets are mapped directly with the source file? Or are there multiple span positions (representing a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, what bytes are not allowed in byte string literals? Does the literal itself have to be valid UTF-8? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A Rust source file must be valid UTF-8. Thus, the contents of a byte string literal in the source must be valid UTF-8. Bytes that are not < 0x80 thus must be escaped to appear in a byte string literal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And then another question that's worth making explicit: what does it even mean for rustc to report a span into a binary file? I think binary includes are better served by a different API that lets rustc point into generated code, rather than trying to point into an opaque binary file. |
||
``` | ||
|
||
As an example, consider a potential implementation of [`core::include`](https://doc.rust-lang.org/stable/core/macro.include.html): | ||
|
||
```rust | ||
#[proc_macro] | ||
pub fn include(input: TokenStream) -> TokenStream { | ||
let mut iter = input.into_iter(); | ||
|
||
let result = 'main: if let Some(tt) = iter.next() { | ||
let TokenTree::Literal(lit) = tt && | ||
let LiteralValue::Str(path) = lit.value() | ||
else { | ||
Diagnostic::spanned(tt.span(), Level::Error, "argument must be a string literal").emit(); | ||
break 'main TokenStream::new(); | ||
} | ||
|
||
match proc_macro::include(&path) { | ||
Ok(token_stream) => token_stream, | ||
Err(err) => { | ||
Diagnostic::spanned(Span::call_site(), Level::Error, format_args!("couldn't read {path}: {err}")).emit(); | ||
TokenStream::new() | ||
} | ||
} | ||
} else { | ||
Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit(); | ||
TokenStream::new() | ||
} | ||
|
||
if let Some(_) = iter.next() { | ||
Diagnostic::spanned(Span::call_site(), Level::Error, "include! takes 1 argument").emit(); | ||
} | ||
|
||
result | ||
} | ||
``` | ||
|
||
(RFC note: this example uses unstable and even unimplemented features for clarity. | ||
However, this RFC in no way requires these features to be useful on its own.) | ||
|
||
# Reference-level explanation | ||
[reference-level-explanation]: #reference-level-explanation | ||
|
||
If a file read is unsuccessful, an encoding of the responsible `io::Error` is passed over the RPC bridge. | ||
If a file is successfully read but fails to lex, `ErrorKind::Other` is returned. | ||
|
||
None of these three APIs should ever cause compilation to fail. | ||
It is the responsibility of the proc macro to fail compilation if a failed file read is fatal. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
This is more API surface for the `proc_macro` crate, and the `proc_macro` bridge is already complicated. | ||
Additionally, this is likely to lead to more proc macros which read external files. | ||
Moving the handling of `include!`-like macros later in the compiler pipeline | ||
likely is also significantly more complicated than the current `include!` implementation. | ||
|
||
# Alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives | ||
|
||
- [`proc_macro::tracked_path`](https://doc.rust-lang.org/stable/proc_macro/tracked_path/fn.path.html) (unstable) | ||
|
||
This just tells the proc_macro driver that the proc macro has a dependency on the given path. | ||
This is sufficient for tracking the file, as the proc macro can just also read the file itself, | ||
but lacks the ability to require the proc macro go through this API, or to provide spans for errors. | ||
|
||
Meaningfully, it'd be nice to be able to sandbox proc macros in wasm à la [watt](https://crates.io/crates/watt) | ||
while still having proc macros capable of reading the filesystem (in a proc_macro driver controlled manner). | ||
|
||
- Custom error type | ||
|
||
A custom error wrapper would provide a point to attach more specific error information than just an | ||
`io::Error`, such as the lexer error encountered by `include`. This RFC opts to use `io::Error` | ||
directly to provide a more minimal API surface. | ||
|
||
- Wrapped return types | ||
|
||
Returning `Literal::string` from `include_str` and `Vec<u8>` from `include_bytes` implies that | ||
the entire included file must be read into memory managed by the Rust global allocator. | ||
Alternatively, a more abstract buffer type could be used which allows more efficiently working | ||
with very large files that could be instead e.g. memmapped rather than read into a buffer. | ||
|
||
This would likely look like `LiteralString` and `LiteralBytes` types in the `proc_macro` bridge, | ||
but this RFC opts to use the existing `Literal` and `Vec<u8>` to provide a more minimal API surface. | ||
|
||
- Status quo | ||
|
||
Proc macros can continue to read files and use `include_str!` to indicate a build dependency. | ||
This is error prone, easy to forget to do, and all around not a great experience. | ||
|
||
# Prior art | ||
[prior-art]: #prior-art | ||
|
||
No known prior art. | ||
|
||
# Unresolved questions | ||
[unresolved-questions]: #unresolved-questions | ||
|
||
- It would be nice for `include` to allow emitting a useful lexer error directly. | ||
This is not currently provided for by the proposed API. | ||
- `include!` sets the "current module path" for the included code. | ||
It's unclear how this should behave for `proc_macro::include`, | ||
and whether this behavior should be replicated at all. | ||
- Should `include_str` get source code normalization (i.e. `\r\n` to `\n`)? | ||
`include_str!` deliberately includes the string exactly as it appears on disk, | ||
and the purpose of these APIs is to provide post-processing steps, | ||
which could need the file to be reproduced exactly, | ||
so the answer is likely *no*, | ||
and the produced `Literal` should represent the exact contents of the file. | ||
- What base directory should relative paths be resolved from? | ||
The two reasonable answers are | ||
|
||
- That which `include!` is relative to in the source file expanding the macro. | ||
- That which `fs` is relative to in the proc macro execution. | ||
|
||
Both have their merits and drawbacks. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One way to support both options would be to take a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose it should just behave the exact same as a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. macro_rules! x {
() => {
include_str!("a")
};
} Somewhat surprisingly, this looks for a file called |
||
- Unknown unknowns. | ||
|
||
# Future possibilities | ||
[future-possibilities]: #future-possibilities | ||
|
||
Future expansion of the proc macro APIs are almost entirely orthogonal from this feature. | ||
As such, here is a small list of potential uses for this API: | ||
|
||
- Processing a Rust-lexer-compatible DSL | ||
- Multi-file parser specifications for toolchains like LALRPOP or pest | ||
- Larger scale Rust syntax experimentations | ||
- Pre-processing `include!`ed assets | ||
- Embedding compiled-at-rustc-time shaders | ||
- Escaping text at compile time for embedding in a document format |
Uh oh!
There was an error while loading. Please reload this page.