-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepend temp files with per-invocation random string to avoid temp filename conflicts #139453
base: master
Are you sure you want to change the base?
Conversation
Some changes occurred in compiler/rustc_codegen_ssa Some changes occurred in compiler/rustc_codegen_gcc This PR modifies cc @jieyouxu |
cg_clif will also need changes. In any case would it make sense to codegen to a file with a randomly generated name using eg the tempfile crate rather using a fixed filename? The existing fixed filename can still be used within the incr comp cache and as name for the archive members. This would also prevent issues when two rustc instances with the same crate name and |
Yeah, the cc was not really me asking whether We already have a fixed name for the file in the incr comp cache, so yeah, I think(?) we can just modify the way we generate temporary file names1 to append the per-session temporary name? Like Though I'm not sure if anyone relies on the filename of these temporaries today; that seems like a more dramatic change given that people are probably doing... funny things with the outputs of Footnotes |
All this code is in src/driver/aot.rs.
People may depend on the rough format of the filename, but can't depend on the exact filename as it is rustc version dependent. |
Ideally all our cachable outputs would be created read-only to prevent this kind of accidental modification since they're semantically write-once anyway. Similar to #137025 |
Some changes occurred in compiler/rustc_codegen_cranelift cc @bjorn3 |
Anyways, I haven't updated this to a new strategy for encoding temporary files. I'll probably evaluate that in a bit and if it's too hard/annoying then I'd prefer we just land this approach to patch the unsoundness. |
This comment has been minimized.
This comment has been minimized.
These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
OK, I updated the description to implement a new strategy of just augmenting these temp filenames with a per-invocation temporary string. I think it's a lot more straightforward than unlinking. I opted to just use |
This comment has been minimized.
This comment has been minimized.
CGU reuse 💀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! The approach and changes look good to me, only left a few remarks.
// FIXME: This is sketchy that we're not appending `.rcgu` when the ext is empty. | ||
// Append `.rcgu.{ext}`. | ||
if !ext.is_empty() { | ||
if !extension.is_empty() { | ||
extension.push('.'); | ||
extension.push_str(RUST_CGU_EXT); | ||
extension.push('.'); | ||
} | ||
|
||
extension.push('.'); | ||
extension.push_str(RUST_CGU_EXT); | ||
extension.push('.'); | ||
extension.push_str(ext); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remark: okay so this bit also seemed a bit sus to me, git archaeology points way back to
and I can't seem to find any specific reason to skip appending .rcgu
when the ext is empty, except for the diagnostic long-type-hash file. So instead, I just looked at the diff:
- self.output_filenames(()).temp_path_ext(&format!("long-type-{hash}.txt"), None)
+ self.output_filenames(()).temp_path_for_diagnostic(&format!("long-type-{hash}.txt"))
All other call sites AFAICT explicitly constructs a CGU name and passes Some(..)
, so I believe it was correct but just hard to follow -- in that it used the CGU name being present or not to encode diagnostics file vs non-diagnostics file...
The cleanup here is indeed very nice because we no longer rely on that Option<CguName>
to splice what temp_path_ext
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup here is indeed very nice because we no longer rely on that Option to splice what temp_path_ext does.
Yeah, that's why I did all that splitting first, since there seemed to be totally distinct responsibilities for temp_path
-- for cgu files and for "other" files.
Also, damn, that's a nasty problem |
@bors r=jieyouxu rollup=never |
Prepend temp files with per-invocation random string to avoid temp filename conflicts rust-lang#139407 uncovered a very subtle unsoundness with incremental codegen, failing compilation sessions (due to assembler errors), and the "prefer hard linking over copying files" strategy we use in the compiler for file management. Specifically, imagine we're building a single file 3 times, all with `-Csave-temps -Cincremental=...`. Let's call the object file we're building for the codegen unit for `main` "`XXX.o`" just for clarity since it's probably some gigantic hash name: ``` #[inline(never)] #[cfg(any(rpass1, rpass3))] fn a() -> i32 { 0 } #[cfg(any(cfail2))] fn a() -> i32 { 1 } fn main() { evil::evil(); assert_eq!(a(), 0); } mod evil { #[cfg(any(rpass1, rpass3))] pub fn evil() { unsafe { std::arch::asm!("/* */"); } } #[cfg(any(cfail2))] pub fn evil() { unsafe { std::arch::asm!("missing"); } } } ``` Session 1 (`rpass1`): * Type-check, borrow-check, etc. * Serialize the dep graph to the incremental working directory `.../s-...-working/`. * Codegen object file to a temp file `XXX.rcgu.o` which is spit out in the cwd. * Hard-link[^1] `XXX.rcgu.o` to the incremental working directory `.../s-...-working/XXX.o`. * Save-temps option means we don't delete `XXX.rgcu.o`. * Link the binary and stuff. * Finalize[^2] the working incremental session by renaming `.../s-...-working` to ` s-...-asjkdhsjakd` (some other finalized incr comp session dir name). Session 2 (`cfail2`): * Load artifacts from the previous *finalized* incremental session, namely the dep graph. * Type-check, borrow-check, etc. since the file has changed, so most dep graph nodes are red. * Serialize the dep graph to the incremental working directory `.../s-...-working/`. * Codegen object file to a temp file `XXX.rcgu.o`. **HERE IS THE PROBLEM**: The hard-link is still set up to point to the inode from `XXX.o` from the first session, so this also modifies the `XXX.o` in the previous finalized session directory. * Codegen emits an error b/c `missing` is not an instruction, so we abort before finalizing the incremental session. Specifically, this means that the *previous* session is the last finalized session. Session 3 (`rpass3`): * Load artifacts from the previous *finalized* incremental session, namely the dep graph. NOTE that this is from session 1. * All the dep graph nodes are green since we are basically replaying session 1. * codegen object file `XXX.o`, which is detected as *reused* from session 1 since dep nodes were green. That means we **reuse** `XXX.o` which had been dirtied from session 2. * Link the binary and stuff. This results in a binary which reuses some of the build artifacts from session 2, but thinks it's from session 1. At this point, I hope it's clear to see that the incremental results from session 1 were dirtied from session 2, but we reuse them as if session 1 was the previous (finalized) incremental session we ran. This is at best really buggy, and at worst **unsound**. This isn't limited to `-C save-temps`, since there are other combinations of flags that may keep around temporary files (hard linked) in the working directory (like `-C debuginfo=1 -C split-debuginfo=unpacked` on darwin, for example). --- This PR implements a fix which is to prepend temp filenames with a random string that is generated per invocation of rustc. This string is not *deterministic*, but temporary files are transient anyways, so I don't believe this is a problem. That means that temp files are now something like... `{crate-name}.{cgu}.{invocation_temp}.rcgu.o`, where `{invocation_temp}` is the new temporary string we generate per invocation of rustc. Fixes rust-lang#139407 [^1]: https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_fs_util/src/lib.rs#L60 [^2]: https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_incremental/src/persist/fs.rs#L1-L40
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
@bors retry |
Prepend temp files with per-invocation random string to avoid temp filename conflicts rust-lang#139407 uncovered a very subtle unsoundness with incremental codegen, failing compilation sessions (due to assembler errors), and the "prefer hard linking over copying files" strategy we use in the compiler for file management. Specifically, imagine we're building a single file 3 times, all with `-Csave-temps -Cincremental=...`. Let's call the object file we're building for the codegen unit for `main` "`XXX.o`" just for clarity since it's probably some gigantic hash name: ``` #[inline(never)] #[cfg(any(rpass1, rpass3))] fn a() -> i32 { 0 } #[cfg(any(cfail2))] fn a() -> i32 { 1 } fn main() { evil::evil(); assert_eq!(a(), 0); } mod evil { #[cfg(any(rpass1, rpass3))] pub fn evil() { unsafe { std::arch::asm!("/* */"); } } #[cfg(any(cfail2))] pub fn evil() { unsafe { std::arch::asm!("missing"); } } } ``` Session 1 (`rpass1`): * Type-check, borrow-check, etc. * Serialize the dep graph to the incremental working directory `.../s-...-working/`. * Codegen object file to a temp file `XXX.rcgu.o` which is spit out in the cwd. * Hard-link[^1] `XXX.rcgu.o` to the incremental working directory `.../s-...-working/XXX.o`. * Save-temps option means we don't delete `XXX.rgcu.o`. * Link the binary and stuff. * Finalize[^2] the working incremental session by renaming `.../s-...-working` to ` s-...-asjkdhsjakd` (some other finalized incr comp session dir name). Session 2 (`cfail2`): * Load artifacts from the previous *finalized* incremental session, namely the dep graph. * Type-check, borrow-check, etc. since the file has changed, so most dep graph nodes are red. * Serialize the dep graph to the incremental working directory `.../s-...-working/`. * Codegen object file to a temp file `XXX.rcgu.o`. **HERE IS THE PROBLEM**: The hard-link is still set up to point to the inode from `XXX.o` from the first session, so this also modifies the `XXX.o` in the previous finalized session directory. * Codegen emits an error b/c `missing` is not an instruction, so we abort before finalizing the incremental session. Specifically, this means that the *previous* session is the last finalized session. Session 3 (`rpass3`): * Load artifacts from the previous *finalized* incremental session, namely the dep graph. NOTE that this is from session 1. * All the dep graph nodes are green since we are basically replaying session 1. * codegen object file `XXX.o`, which is detected as *reused* from session 1 since dep nodes were green. That means we **reuse** `XXX.o` which had been dirtied from session 2. * Link the binary and stuff. This results in a binary which reuses some of the build artifacts from session 2, but thinks it's from session 1. At this point, I hope it's clear to see that the incremental results from session 1 were dirtied from session 2, but we reuse them as if session 1 was the previous (finalized) incremental session we ran. This is at best really buggy, and at worst **unsound**. This isn't limited to `-C save-temps`, since there are other combinations of flags that may keep around temporary files (hard linked) in the working directory (like `-C debuginfo=1 -C split-debuginfo=unpacked` on darwin, for example). --- This PR implements a fix which is to prepend temp filenames with a random string that is generated per invocation of rustc. This string is not *deterministic*, but temporary files are transient anyways, so I don't believe this is a problem. That means that temp files are now something like... `{crate-name}.{cgu}.{invocation_temp}.rcgu.o`, where `{invocation_temp}` is the new temporary string we generate per invocation of rustc. Fixes rust-lang#139407 [^1]: https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_fs_util/src/lib.rs#L60 [^2]: https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_incremental/src/persist/fs.rs#L1-L40
#139407 uncovered a very subtle unsoundness with incremental codegen, failing compilation sessions (due to assembler errors), and the "prefer hard linking over copying files" strategy we use in the compiler for file management.
Specifically, imagine we're building a single file 3 times, all with
-Csave-temps -Cincremental=...
. Let's call the object file we're building for the codegen unit formain
"XXX.o
" just for clarity since it's probably some gigantic hash name:Session 1 (
rpass1
):.../s-...-working/
.XXX.rcgu.o
which is spit out in the cwd.XXX.rcgu.o
to the incremental working directory.../s-...-working/XXX.o
.XXX.rgcu.o
..../s-...-working
tos-...-asjkdhsjakd
(some other finalized incr comp session dir name).Session 2 (
cfail2
):.../s-...-working/
.XXX.rcgu.o
. HERE IS THE PROBLEM: The hard-link is still set up to point to the inode fromXXX.o
from the first session, so this also modifies theXXX.o
in the previous finalized session directory.missing
is not an instruction, so we abort before finalizing the incremental session. Specifically, this means that the previous session is the last finalized session.Session 3 (
rpass3
):XXX.o
, which is detected as reused from session 1 since dep nodes were green. That means we reuseXXX.o
which had been dirtied from session 2.This results in a binary which reuses some of the build artifacts from session 2, but thinks it's from session 1.
At this point, I hope it's clear to see that the incremental results from session 1 were dirtied from session 2, but we reuse them as if session 1 was the previous (finalized) incremental session we ran. This is at best really buggy, and at worst unsound.
This isn't limited to
-C save-temps
, since there are other combinations of flags that may keep around temporary files (hard linked) in the working directory (like-C debuginfo=1 -C split-debuginfo=unpacked
on darwin, for example).This PR implements a fix which is to prepend temp filenames with a random string that is generated per invocation of rustc. This string is not deterministic, but temporary files are transient anyways, so I don't believe this is a problem.
That means that temp files are now something like...
{crate-name}.{cgu}.{invocation_temp}.rcgu.o
, where{invocation_temp}
is the new temporary string we generate per invocation of rustc.Fixes #139407
Footnotes
https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_fs_util/src/lib.rs#L60 ↩
https://github.com/rust-lang/rust/blob/175dcc7773d65c1b1542c351392080f48c05799f/compiler/rustc_incremental/src/persist/fs.rs#L1-L40 ↩