Skip to content

Always use a param file when calling into rustc #1076

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ddeville
Copy link
Contributor

@ddeville ddeville commented Dec 22, 2021

Transitive crate dependencies are passed to rustc by specifying the folder they live in as a library search path (an -Ldependency=path flag). Given that bazel builds all libraries into their own folders, this means passing potentially hundreds of paths to rustc which can lead to overflowing the rather short limit for command arguments on Windows.

For this reason, this patch creates symlinks to all the transitive crates into a single directory that it then sets as the -Ldependency= flag to rustc. This dramatically shortens the command arguments length when calling rustc. Note that we only do that on Windows since it introduces some prep actions that need to do IO and there is no need to regress other platforms that support much longer command args.

I opted for declaring these files as inputs and creating the symlinks via ctx.actions.symlink but another option would have been to do that in the process wrapper instead.

For context, here is the size returned by getconf ARG_MAX on my following machines:

macos (12 x86_64):      1,048,576
linux (arch x86_64):    2,097,152  // obviously somewhat lower on 32-bit but
I don't have a machine handy
windows (10 x86_64):    32,000

@ddeville ddeville force-pushed the transitive-crates-search-path branch 5 times, most recently from 8f648a7 to 74762fa Compare December 22, 2021 19:30
@ddeville ddeville changed the title Collect and symlink all transitive crates into a single directory Collect and symlink all transitive crates into a single directory on Windows Dec 22, 2021
@ddeville ddeville force-pushed the transitive-crates-search-path branch 2 times, most recently from 4221c8a to 6939bda Compare December 22, 2021 19:37
@UebelAndre UebelAndre self-requested a review January 3, 2022 16:38
@illicitonion
Copy link
Collaborator

rustc accepts passing arguments in a newline-delimited file:

% rustc --help -v | grep '@path'
    @path               Read newline separated options from `path`

Would it work to just create a single text file with contents along the lines of:

-L
foo
-L
bar

rather than having to set up a whole directory which will be a lot of overhead (particularly on Windows where symlinks are often actually copies), and allow us to use a consistent command line across platforms?

I think we're possibly currently bypassing Bazel's built-in ability to do this which we set up here because that ends up being an argfile to the process-wrapper rather than rustc itself?

@ddeville
Copy link
Contributor Author

ddeville commented Jan 4, 2022

Oh wow I looked around for this but completely missed it. Let me give this a try and see how that works, I'd also much rather avoid having to do all the work setting up symlinks…

@ddeville
Copy link
Contributor Author

ddeville commented Jan 5, 2022

So I'm doubly embarrassed because 1) I added this use_param_file for the rustc args a few weeks ago and 2) I managed to forget about the @param_file option to rustc since then...

Anyway, with that out of the way 😆, I took a look at how it gets propagated from the run action to rustc via the process wrapper. Turns out it just works, it's passed to the process wrapper as @libhello_lib-1405984315.rlib-0.params and it forwards it to rustc, as expected.

What I realized however is that we have to set use_always to True. use_param_file is documented to automatically switch to a param file when needed if use_always is False but that's clearly not the case since I've been encountering args list way past the 32k character limit in the process wrapper on Windows.
However I was able to get my pathological large project to build fine on Windows by setting use_always to True so we should probably just change it.

This patch is gonna get a lot smaller :)

@ddeville ddeville force-pushed the transitive-crates-search-path branch from 6939bda to dc8c60f Compare January 5, 2022 06:41
@ddeville ddeville changed the title Collect and symlink all transitive crates into a single directory on Windows Always use a param file when calling into rustc Jan 5, 2022
@ddeville
Copy link
Contributor Author

ddeville commented Jan 5, 2022

Hmm not sure where all these test failures come from, I'll take a look tomorrow morning.

Edit: oh, we have some {pwd} inside the param file that have to be expanded. I guess the process wrapper will have to read the param file and substitute those.

@illicitonion
Copy link
Collaborator

Sounds great - thanks so much for looking into it!

@ddeville
Copy link
Contributor Author

ddeville commented Jan 7, 2022

Quick update: I think I've got something working for the general case but am having trouble with rust_doc_test since it creates a shell script for the test that copies the arguments and in case they are written into a param file, it won't be in the runfiles folder.

#!/usr/bin/env bash

exec env - \
CARGO_CRATE_NAME='lib' \
CARGO_PKG_VERSION='0.0.0' \
CARGO_PKG_VERSION_PATCH='0' \
CARGO_PKG_DESCRIPTION='' \
CARGO_PKG_AUTHORS='' \
CARGO_PKG_VERSION_MAJOR='0' \
CARGO_PKG_HOMEPAGE='' \
CARGO_PKG_VERSION_MINOR='0' \
CARGO_PKG_VERSION_PRE='' \
CARGO_CFG_TARGET_OS='linux' \
CARGO_PKG_NAME='lib_doctest' \
CARGO_MANIFEST_DIR='${pwd}/test/unit/rustdoc' \
CARGO_CFG_TARGET_ARCH='x86_64' \
SYSROOT='../rust_linux_x86_64/lib/rustlib/x86_64-unknown-linux-gnu/lib' \
'util/process_wrapper/process_wrapper' '--subst' 'pwd=${pwd}' '--' 'external/rust_linux_x86_64/bin/rustdoc' '@bazel-out/k8-fastbuild/bin/test/unit/rustdoc/lib_doctest.rustdoc_test.sh-0.params'

In this case @bazel-out/k8-fastbuild/bin/test/unit/rustdoc/lib_doctest.rustdoc_test.sh-0.params points to nothing since the test runs from the runfiles.

So I guess we either need a way to move this params file (that is created by Bazel automatically based on the Args) to the test runfiles or find a way to make the path itself absolute before writing it down in the script, which I'm not sure we can do. 🤔

@ddeville
Copy link
Contributor Author

ddeville commented Jan 7, 2022

I think I'm probably going to opt out the rustdoc tests from generating a param file when constructing the rustc arguments.

@UebelAndre
Copy link
Collaborator

I think I'm probably going to opt out the rustdoc tests from generating a param file when constructing the rustc arguments.

Yeah, rust_doc_test is currently in a weird state and I think there are a few contexts where it doesn't work. I think it'd be ok to defer updating it as long as the changes don't reduce the amount of places it currently works.

@ddeville ddeville force-pushed the transitive-crates-search-path branch 3 times, most recently from 639d39b to 3779fc1 Compare January 7, 2022 05:18
@ddeville
Copy link
Contributor Author

ddeville commented Jan 7, 2022

Looks like there are some build failures when building the examples.. I'll check on my macos machine since I can't seem to build the examples on my linux box due to failures to find libclang/libinfo as discussed in #251.

@ddeville ddeville force-pushed the transitive-crates-search-path branch from 3779fc1 to 6421501 Compare January 7, 2022 06:03
@ddeville
Copy link
Contributor Author

ddeville commented Jan 7, 2022

Alright, it was missing a call to ReplaceTokens, I think everything should now build fine.

Copy link
Collaborator

@illicitonion illicitonion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@UebelAndre @hlopko How's your C++? The C++ could probably do with better eyes than mine giving it a quick pass :)

@UebelAndre
Copy link
Collaborator

I can take a look later today. Sorry for the delay 😅

Copy link
Collaborator

@UebelAndre UebelAndre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, had to do a second pass. I think this is looking good! I had a couple questions and one request for a test but then I think it's ready to go 😄

@@ -48,7 +48,8 @@ def rustdoc_compile_action(
toolchain,
crate_info,
output = None,
rustdoc_flags = []):
rustdoc_flags = [],
use_param_file_for_rustc_args = True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a small unittest ensuring that when this flag is True prama files prefixed with @ are found in argv of the action?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried doing that but it looks like Action.argv returns the actual arguments for the action before they are turned into a file, rather than the arguments passed to the executable...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's kinda crazy. I feel like this is a bug in Bazel... Is there a way we can at least test two different actions with and without this flag?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose not. That's the point of the "always" in this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this is an unfortunate and a known limitation of unittesting args. You can access the original args value from the unittest (https://docs.bazel.build/versions/main/skylark/lib/Action.html#args), but you cannot do anything with it other than setting it as args into some other action. So the only way we could test this is by writing all these args into a file and registering an action that reads the file and makes assertions. In other words, you cannot use unittest, you need an integration test.

@ddeville ddeville force-pushed the transitive-crates-search-path branch 2 times, most recently from 9764bc3 to 9f79bf1 Compare January 12, 2022 01:55
@hlopko
Copy link
Member

hlopko commented Jan 12, 2022

Hello everybody, I'm back from a lovely long vacation and I'm also curious about this. Reviewing right now :)

@hlopko
Copy link
Member

hlopko commented Jan 12, 2022

What I realized however is that we have to set use_always to True. use_param_file is documented to automatically switch to a param file when needed if use_always is False but that's clearly not the case since I've been encountering args list way past the 32k character limit in the process wrapper on Windows.
However I was able to get my pathological large project to build fine on Windows by setting use_always to True so we should probably just change it.

@meteorcloudy do you know if use_param_file works correctly on Windows? This is where we use it: https://cs.opensource.google/bazel/rules_rust/+/main:rust/private/rustc.bzl;l=588?q=rustc.bzl. Is this something we should file a bug over at bazelbuild/bazel?

@hlopko
Copy link
Member

hlopko commented Jan 12, 2022

@meteorcloudy So one thing that is not so common in the ecosystem is that we actually pass 3 instances of args to the action, and only one of them has use_param_file set (https://cs.opensource.google/bazel/rules_rust/+/main:rust/private/rustc.bzl;l=690?q=rustc.bzl). Maybe rules_rust are the first such user?

System::StrType& param = *par_it;
ReplaceTokens(param, subst_mappings);

// If we find a path to a param file, move it to the top-level file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this behavior? Could we just fail? I'm imagining a bug in rules_rust that produces nested param files and we wouldn't notice because process wrapper would recover from that. I also worry that moving param file args around can change the order of flags in a surprising way so I try to avoid it in general.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was also my first intuition but it turns out that we have a test that specifically checks that extra @param_file_path arguments are supported, by adding such an arg to the rustc_flags:

I wasn't able to gather the rationale for supporting this in the PR #809 or the associated issue #801, but there's a comment from you on that review that might be worth re-discussing here 😉 https://github.com/bazelbuild/rules_rust/pull/809/files#r662057446

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this explains the scenario: #801 (comment). There are ways to reimplement that use case to not use param files, but I think I'm changing my mind, we should support nested param files. But I think the correct support should keep the relative order of flags. What do you think about flattening the param files, as in, if we find a path to a param file, we read the content of that file and use that instead of the @nested_file flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that makes sense. I like the idea of flattening params files, I've update the PR to do that.

return -1;
}

arguments.erase(std::next(arg_it).base());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh this looks complicated. Would using remove or remove_if (https://en.cppreference.com/w/cpp/algorithm/remove) and https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom after the for loop ends be simpler to reason about?

I know it doesn't matter in our workloads, but even hypothethically repeatedly prepending into a vector makes me nervous. How about collecting new file arguments into a new vec and merging them once after the end of the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a fair callout. I'll look at simplifying that.

Copy link
Member

@hlopko hlopko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but I'd like to wait for @meteorcloudy to get back to us about the root problem here. However the process wrapper change is necessary in any case - without it we don't handle param files correctly today. So thank you so much!

@meteorcloudy
Copy link
Member

@meteorcloudy do you know if use_param_file works correctly on Windows?

I'm not quite familiar with this API, but I don't see any reason why it shouldn't work on Windows. You do have test coverage on Windows (and it seems passing), right?

@oquenchil
Copy link

What I realized however is that we have to set use_always to True. use_param_file is documented to automatically switch to a param file when needed if use_always is False but that's clearly not the case since I've been encountering args list way past the 32k character limit in the process wrapper on Windows.
However I was able to get my pathological large project to build fine on Windows by setting use_always to True so we should probably just change it.

@meteorcloudy do you know if use_param_file works correctly on Windows? This is where we use it: https://cs.opensource.google/bazel/rules_rust/+/main:rust/private/rustc.bzl;l=588?q=rustc.bzl. Is this something we should file a bug over at bazelbuild/bazel?

It might be a bug where on Windows it doesn't decide properly. Maybe just use_always=True for now and you can file a bug on bazelbuild/bazel.

@ddeville
Copy link
Contributor Author

Welcome back @hlopko, I hope you had a relaxing break 😃.

@ddeville ddeville force-pushed the transitive-crates-search-path branch 3 times, most recently from 3ba5090 to cad87bb Compare January 12, 2022 23:59
@hlopko
Copy link
Member

hlopko commented Jan 21, 2022

Thanks @meteorcloudy and @oquenchil, @ddeville, will you have the capacity to investigate if this is a bazel bug? I guess trying to create a minimal repro case as a standalone repository would be the most useful. There we can try a single args instance with an extremely long command line, if that works correctly then try multiple args instances in a single action, if that works, then I don't know :) I'm happy to help but I'm hopeless on Windows..

@meteorcloudy
Copy link
Member

@hlopko Feel free to file a bug to Bazel, I'm always happy to fix bugs. I minimal repo would be indeed very helpful! @oquenchil is also gaining a lot of Windows knowledge now, he'll have insight on both side. ;)

@ddeville ddeville force-pushed the transitive-crates-search-path branch from cad87bb to f0d5417 Compare February 7, 2022 23:31
@ddeville
Copy link
Contributor Author

ddeville commented Feb 7, 2022

Hey everyone, sorry for the delay here, the last few weeks have been particularly busy...

I'd be happy to help test this on Windows but since my project is still stuck on 4.1.0 it might not be the best place to start. I could try setting up a minimal repro, I'll just have to prep for some Windows pain, as is always the case 😄.

However ideally I'd like to do this independently of the work here since the fix in the process wrapper needs to happen regardless. What do you think?

@ddeville ddeville force-pushed the transitive-crates-search-path branch from f0d5417 to 4d25814 Compare February 7, 2022 23:57
The argument list seems to constantly overflow on Windows (for reasons
that I don't really understand since `use_param_file` should dynamically
switch to using a param file in these cases), so always generate a param
file.
@ddeville ddeville force-pushed the transitive-crates-search-path branch from 4d25814 to d342503 Compare February 8, 2022 21:18
@hlopko
Copy link
Member

hlopko commented Feb 9, 2022

Yeah I totally understand. Just to get it out of the way, I'm trying to figure out if there is a way we could still use the only-when-needed param files (maybe on Linux only), since those command lines are easier to work with when using a remote execution service.

But how I'd start would be to write a test that uses our process wrapper and fails on Linux today (custom Starlark rule that generates a command line that has more than 2097152 (my getconf ARG_MAX :) elements in ctx.actions.args instance. It should fail. If it fails because process wrapper doesn't handle param files correctly, I'd upload a PR with the test and with the fix in the process wrapper from this PR (and I'm happy to merge that in quickly).

In the presubmit of that PR we will see what happens on Mac and on Windows. We may have to disable the test if the params file was not created correctly there.

Then we can discuss what options we have (either the rest of this PR, or only doing the fix of this PR when running on failing OS, or using ctx.actions.args in a way that works since I find it unlikely that the feature is fully broken).

Does that sound reasonable?

@ddeville
Copy link
Contributor Author

Yep that makes sense.

As an update, I've also been doing more testing with this PR on my project and I'm still able to reproduce the bug on Windows :(. It's really not super clear what's going on, the manifestation of the bug is that some imports in code just don't resolve. When I symlink all the dependencies into one directory and pass a single -Ldependency that points to the directory, things work fine.

Since the project question is super large and entirely proprietary code I've been trying to put together a self-contained sample workspace that could become the base of the testing required here.
I started by checking whether this might be an issue with rustc itself but wasn't able to repro by making the compiler invocations myself (and rust-lang/rust#44094 seems to imply that it hasn't been a problem over there for a while).

I'm gonna try making a bazel workspace using the rules and see where that takes me. It's definitely been a puzzling endeavor so far unfortunately :(

@ddeville
Copy link
Contributor Author

Ok some update here. I've been playing around with this some more and I don't think there is any issue with the automatic use of a param file on Windows (see hacky https://github.com/ddeville/lots-of-rust-deps for generating an insane amount of Rust dependencies 😄). Bazel correctly turns large argument lists that would overflow into param files as expected, on all platforms.

I am still hitting that weird issue where rustc fails to find some extern when a large (I think > 400) number of -Ldependency= paths are passed, but only on Windows (it fails with an odd error message, something like error: LoadLibraryExW failed). If I tweak things to symlink all these deps into a single folder and pass that to rustc (same thing that Cargo does) then it works fine. I'll have to debug this some more but I don't think it has anything to do with the argument list/param file but potentially something in rustc itself.

It'd still be nice to fix the nested param files issue here but now that the process wrapper was rewritten in Rust that'll take some more work. I'll see if I have some time to migrate this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants