-
Notifications
You must be signed in to change notification settings - Fork 481
Always use a param file when calling into rustc #1076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8f648a7
to
74762fa
Compare
4221c8a
to
6939bda
Compare
% rustc --help -v | grep '@path'
@path Read newline separated options from `path` Would it work to just create a single text file with contents along the lines of:
rather than having to set up a whole directory which will be a lot of overhead (particularly on Windows where symlinks are often actually copies), and allow us to use a consistent command line across platforms? I think we're possibly currently bypassing Bazel's built-in ability to do this which we set up here because that ends up being an argfile to the process-wrapper rather than rustc itself? |
Oh wow I looked around for this but completely missed it. Let me give this a try and see how that works, I'd also much rather avoid having to do all the work setting up symlinks… |
So I'm doubly embarrassed because 1) I added this Anyway, with that out of the way 😆, I took a look at how it gets propagated from the run action to rustc via the process wrapper. Turns out it just works, it's passed to the process wrapper as What I realized however is that we have to set This patch is gonna get a lot smaller :) |
6939bda
to
dc8c60f
Compare
Hmm not sure where all these test failures come from, I'll take a look tomorrow morning. Edit: oh, we have some |
Sounds great - thanks so much for looking into it! |
Quick update: I think I've got something working for the general case but am having trouble with
In this case So I guess we either need a way to move this params file (that is created by Bazel automatically based on the |
I think I'm probably going to opt out the rustdoc tests from generating a param file when constructing the rustc arguments. |
Yeah, |
639d39b
to
3779fc1
Compare
Looks like there are some build failures when building the examples.. I'll check on my macos machine since I can't seem to build the examples on my linux box due to failures to find libclang/libinfo as discussed in #251. |
3779fc1
to
6421501
Compare
Alright, it was missing a call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
@UebelAndre @hlopko How's your C++? The C++ could probably do with better eyes than mine giving it a quick pass :)
I can take a look later today. Sorry for the delay 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, had to do a second pass. I think this is looking good! I had a couple questions and one request for a test but then I think it's ready to go 😄
@@ -48,7 +48,8 @@ def rustdoc_compile_action( | |||
toolchain, | |||
crate_info, | |||
output = None, | |||
rustdoc_flags = []): | |||
rustdoc_flags = [], | |||
use_param_file_for_rustc_args = True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small unittest ensuring that when this flag is True
prama files prefixed with @
are found in argv of the action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried doing that but it looks like Action.argv
returns the actual arguments for the action before they are turned into a file, rather than the arguments passed to the executable...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's kinda crazy. I feel like this is a bug in Bazel... Is there a way we can at least test two different actions with and without this flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose not. That's the point of the "always" in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup this is an unfortunate and a known limitation of unittesting args. You can access the original args
value from the unittest (https://docs.bazel.build/versions/main/skylark/lib/Action.html#args), but you cannot do anything with it other than setting it as args into some other action. So the only way we could test this is by writing all these args into a file and registering an action that reads the file and makes assertions. In other words, you cannot use unittest, you need an integration test.
9764bc3
to
9f79bf1
Compare
Hello everybody, I'm back from a lovely long vacation and I'm also curious about this. Reviewing right now :) |
@meteorcloudy do you know if |
@meteorcloudy So one thing that is not so common in the ecosystem is that we actually pass 3 instances of args to the action, and only one of them has |
System::StrType& param = *par_it; | ||
ReplaceTokens(param, subst_mappings); | ||
|
||
// If we find a path to a param file, move it to the top-level file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this behavior? Could we just fail? I'm imagining a bug in rules_rust that produces nested param files and we wouldn't notice because process wrapper would recover from that. I also worry that moving param file args around can change the order of flags in a surprising way so I try to avoid it in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was also my first intuition but it turns out that we have a test that specifically checks that extra @param_file_path
arguments are supported, by adding such an arg to the rustc_flags
:
"@$(location :flag_generator)", |
I wasn't able to gather the rationale for supporting this in the PR #809 or the associated issue #801, but there's a comment from you on that review that might be worth re-discussing here 😉 https://github.com/bazelbuild/rules_rust/pull/809/files#r662057446
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this explains the scenario: #801 (comment). There are ways to reimplement that use case to not use param files, but I think I'm changing my mind, we should support nested param files. But I think the correct support should keep the relative order of flags. What do you think about flattening the param files, as in, if we find a path to a param file, we read the content of that file and use that instead of the @nested_file
flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes sense. I like the idea of flattening params files, I've update the PR to do that.
return -1; | ||
} | ||
|
||
arguments.erase(std::next(arg_it).base()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this looks complicated. Would using remove
or remove_if
(https://en.cppreference.com/w/cpp/algorithm/remove) and https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom after the for loop ends be simpler to reason about?
I know it doesn't matter in our workloads, but even hypothethically repeatedly prepending into a vector makes me nervous. How about collecting new file arguments into a new vec and merging them once after the end of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a fair callout. I'll look at simplifying that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but I'd like to wait for @meteorcloudy to get back to us about the root problem here. However the process wrapper change is necessary in any case - without it we don't handle param files correctly today. So thank you so much!
I'm not quite familiar with this API, but I don't see any reason why it shouldn't work on Windows. You do have test coverage on Windows (and it seems passing), right? |
It might be a bug where on Windows it doesn't decide properly. Maybe just use_always=True for now and you can file a bug on bazelbuild/bazel. |
Welcome back @hlopko, I hope you had a relaxing break 😃. |
3ba5090
to
cad87bb
Compare
Thanks @meteorcloudy and @oquenchil, @ddeville, will you have the capacity to investigate if this is a bazel bug? I guess trying to create a minimal repro case as a standalone repository would be the most useful. There we can try a single args instance with an extremely long command line, if that works correctly then try multiple args instances in a single action, if that works, then I don't know :) I'm happy to help but I'm hopeless on Windows.. |
@hlopko Feel free to file a bug to Bazel, I'm always happy to fix bugs. I minimal repo would be indeed very helpful! @oquenchil is also gaining a lot of Windows knowledge now, he'll have insight on both side. ;) |
cad87bb
to
f0d5417
Compare
Hey everyone, sorry for the delay here, the last few weeks have been particularly busy... I'd be happy to help test this on Windows but since my project is still stuck on 4.1.0 it might not be the best place to start. I could try setting up a minimal repro, I'll just have to prep for some Windows pain, as is always the case 😄. However ideally I'd like to do this independently of the work here since the fix in the process wrapper needs to happen regardless. What do you think? |
f0d5417
to
4d25814
Compare
The argument list seems to constantly overflow on Windows (for reasons that I don't really understand since `use_param_file` should dynamically switch to using a param file in these cases), so always generate a param file.
4d25814
to
d342503
Compare
Yeah I totally understand. Just to get it out of the way, I'm trying to figure out if there is a way we could still use the only-when-needed param files (maybe on Linux only), since those command lines are easier to work with when using a remote execution service. But how I'd start would be to write a test that uses our process wrapper and fails on Linux today (custom Starlark rule that generates a command line that has more than 2097152 (my In the presubmit of that PR we will see what happens on Mac and on Windows. We may have to disable the test if the params file was not created correctly there. Then we can discuss what options we have (either the rest of this PR, or only doing the fix of this PR when running on failing OS, or using ctx.actions.args in a way that works since I find it unlikely that the feature is fully broken). Does that sound reasonable? |
Yep that makes sense. As an update, I've also been doing more testing with this PR on my project and I'm still able to reproduce the bug on Windows :(. It's really not super clear what's going on, the manifestation of the bug is that some imports in code just don't resolve. When I symlink all the dependencies into one directory and pass a single Since the project question is super large and entirely proprietary code I've been trying to put together a self-contained sample workspace that could become the base of the testing required here. I'm gonna try making a bazel workspace using the rules and see where that takes me. It's definitely been a puzzling endeavor so far unfortunately :( |
Ok some update here. I've been playing around with this some more and I don't think there is any issue with the automatic use of a param file on Windows (see hacky https://github.com/ddeville/lots-of-rust-deps for generating an insane amount of Rust dependencies 😄). Bazel correctly turns large argument lists that would overflow into param files as expected, on all platforms. I am still hitting that weird issue where rustc fails to find some extern when a large (I think > 400) number of It'd still be nice to fix the nested param files issue here but now that the process wrapper was rewritten in Rust that'll take some more work. I'll see if I have some time to migrate this PR. |
Transitive crate dependencies are passed to rustc by specifying the folder they live in as a library search path (an
-Ldependency=path
flag). Given that bazel builds all libraries into their own folders, this means passing potentially hundreds of paths to rustc which can lead to overflowing the rather short limit for command arguments on Windows.For this reason, this patch creates symlinks to all the transitive crates into a single directory that it then sets as the
-Ldependency=
flag to rustc. This dramatically shortens the command arguments length when calling rustc. Note that we only do that on Windows since it introduces some prep actions that need to do IO and there is no need to regress other platforms that support much longer command args.I opted for declaring these files as inputs and creating the symlinks via
ctx.actions.symlink
but another option would have been to do that in the process wrapper instead.For context, here is the size returned by
getconf ARG_MAX
on my following machines: