Skip to content

(c2rust-analyze) Handle inline const refs, including string literals #886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 24, 2023

Conversation

kkysen
Copy link
Contributor

@kkysen kkysen commented Apr 10, 2023

Fixes #837.

This only handles const refs that are inline and thus local to a function, as determined by being ConstantKind::Val. Thus, this handles string literals like "" (inline_str) and b"" (inline_bstr), which is the main thing we're trying to support.

This PR also adds tests for other similar string usages, though they're not handled and turned off for now. The outline cases, where the const ref/string literal is used through a named constant, aren't handled, as they are globally accessible and that makes them more complex. Since supporting them isn't necessary for string literals, which is the main usage in lighttpd-rust.

@kkysen kkysen requested a review from aneksteind April 10, 2023 18:39
@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from f0dacf6 to 4f1d32f Compare April 10, 2023 18:40
@kkysen kkysen requested a review from spernsteiner April 10, 2023 18:40
@kkysen kkysen changed the title (c2rust-analyze) Handle local const refs, including string literals (c2rust-analyze) Handle inline const refs, including string literals Apr 10, 2023
@kkysen
Copy link
Contributor Author

kkysen commented Apr 10, 2023

I'm not sure if/how this PR should add exact READ | OFFSET_ADD permissions for string literals (and READ permissions for scalar literal refs).

@kkysen
Copy link
Contributor Author

kkysen commented Apr 10, 2023

@spernsteiner, does this look like it's on the right track?

@spernsteiner
Copy link
Collaborator

I'm not sure if/how this PR should add exact READ | OFFSET_ADD permissions for string literals

I think the best approach for this is to assign PointerIds to these literals (as you've implemented here), set up the permissions/flags for those PointerIds in the initial Assignment prior to running dataflow and borrowck, and assert at the end that the permissions/flags weren't modified. This will catch cases where the program tries to mutate a string literal (or other constant) and will stop us from trying to generate a bogus rewrite, like trying to turn "foo" into &mut str.

@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from 4f1d32f to 009c65b Compare April 13, 2023 08:32
@kkysen
Copy link
Contributor Author

kkysen commented Apr 13, 2023

assert at the end that the permissions/flags weren't modified

If I do it this way, would it make more sense to keep const_tys separate from rvalue_tys?

@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from b6de80d to 01708f4 Compare April 13, 2023 09:18
@kkysen kkysen requested a review from spernsteiner April 13, 2023 17:04
Copy link
Collaborator

@spernsteiner spernsteiner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me, aside from that final actual/expected permission check discussed above.

@aneksteind
Copy link
Contributor

If I do it this way, would it make more sense to keep const_tys separate from rvalue_tys?

@kkysen why not go about it this way, i.e. why is const_tys used instead?

@kkysen
Copy link
Contributor Author

kkysen commented Apr 14, 2023

I was going to change it back to using rvalue_tys, but if I need to iterate through the constant ones at the end to check the permissions weren't modified, then it's a lot simpler if there's a separate map.

@aneksteind
Copy link
Contributor

aneksteind commented Apr 14, 2023

I think it would be cleaner to keep the rvalue_tys dict alone and instead have logic that checks permissions and matches against the RValue ty and checks there. If we want to check similar things in the future for different rvalue types we would end up adding more maps to the context to maintain the pattern and I don't think that'd be the best design

@kkysen
Copy link
Contributor Author

kkysen commented Apr 14, 2023

There's not really a way to tell an LTy came from a constant, though, right? I'm not sure how to retrieve that information back.

@kkysen
Copy link
Contributor Author

kkysen commented Apr 17, 2023

@spernsteiner, the post-typeck check is erroring, saying the permissions have expanded from READ | OFFSET_ADD to READ | UNIQUE | OFFSET_ADD. Is this correct?

@kkysen
Copy link
Contributor Author

kkysen commented Apr 17, 2023

@spernsteiner, also, do you think any FileCheck tests are needed for this, or is the post-typeck check that the permissions are as expected more than enough?

@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from ec14e80 to f19806e Compare April 17, 2023 07:15
@spernsteiner
Copy link
Collaborator

I agree it would be a little cleaner to use rvalue_tys for controlling type_of results instead of adding a second map to do lookups in. I think the strategy there would be to insert into rvalue_tys and also add an entry to a list a PointerId permission adjustments/assertions.

the post-typeck check is erroring, saying the permissions have expanded from READ | OFFSET_ADD to READ | UNIQUE | OFFSET_ADD. Is this correct?

It's harmless, since UNIQUE is only relevant when WRITE is also present, but I don't think it should be picking up the UNIQUE permission here. The inference for UNIQUE works by setting UNIQUE on all PointerIds and then removing it in places where it causes a borrowck error - possibly the code that sets UNIQUE is running after the new code for setting the fixed permissions on these constants?

do you think any FileCheck tests are needed for this, or is the post-typeck check that the permissions are as expected more than enough?

The new test that's currently on this branch and the assertions seem good enough to me.

@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch 2 times, most recently from 05409f6 to 2ce2303 Compare April 21, 2023 09:00
kkysen added 4 commits April 21, 2023 02:42
… `AnalysisCtxt::const_tys: HashMap<ConstantKind, LTy>`.
…:Ref`, as non-refs work fine on their own.
@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from ada0fc1 to 89fde1b Compare April 21, 2023 09:42
…ed to add `PhantomLifetime` to get the lifetimes to work).
@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from 89fde1b to f14050b Compare April 21, 2023 09:45
@kkysen
Copy link
Contributor Author

kkysen commented Apr 21, 2023

I agree it would be a little cleaner to use rvalue_tys for controlling type_of results instead of adding a second map to do lookups in. I think the strategy there would be to insert into rvalue_tys and also add an entry to a list a PointerId permission adjustments/assertions.

I moved the LTys to rvalue_tys now (in 8665bc2), and changed the const_ref_tys map to the const_ref_locs Vec<Location>, which just indexes into the rvalue_tys map.

@spernsteiner, @aneksteind, does this look good now?

the post-typeck check is erroring, saying the permissions have expanded from READ | OFFSET_ADD to READ | UNIQUE | OFFSET_ADD. Is this correct?

It's harmless, since UNIQUE is only relevant when WRITE is also present, but I don't think it should be picking up the UNIQUE permission here. The inference for UNIQUE works by setting UNIQUE on all PointerIds and then removing it in places where it causes a borrowck error - possibly the code that sets UNIQUE is running after the new code for setting the fixed permissions on these constants?

I now unset UNIQUE before doing the check (in 91bfa85).

@@ -465,6 +465,10 @@ pub fn visit<'tcx>(
equiv_constraints: Vec::new(),
};

for (ptr, perms) in acx.const_ref_perms() {
tc.constraints.add_all_perms(ptr, perms);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add the constraints as you see consts in visit_rvalue?

Copy link
Contributor

@aneksteind aneksteind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why PhantomLifetime is necessary. Without knowing for sure it seems like an unnecessary complication. I also think the vector of locations for const refs is similarly unnecessary in that the permissions can be checked in another visit of the body after the permissions have been propagated (maybe in the borrowck phase or somewhere in between) in combination with adding constraints during the dataflow visitation.

Copy link
Contributor

@aneksteind aneksteind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite my comments, however, I think this looks good enough. Insofar as the points made above are valid, consider reducing the complexity slightly.

@kkysen
Copy link
Contributor Author

kkysen commented Apr 21, 2023

I'm not sure why PhantomLifetime is necessary. Without knowing for sure it seems like an unnecessary complication.

It was needed due to how lifetimes and impl Trait work where I can't do multiple impl Trait + 'a + 'bs and have it work, but I can do impl Trait + PhantomLifetime<'a> + 'b and that works. It's similar to PhantomData and type parameters, but with lifetimes.

@kkysen
Copy link
Contributor Author

kkysen commented Apr 21, 2023

I also think the vector of locations for const refs is similarly unnecessary in that the permissions can be checked in another visit of the body after the permissions have been propagated (maybe in the borrowck phase or somewhere in between) in combination with adding constraints during the dataflow visitation.

Do you mean re-iterating through BasicBlocks, Statements, Rvalues, etc. to find the Constants again? Why is that better than doing it once and storing the result?

@aneksteind
Copy link
Contributor

Do you mean re-iterating through BasicBlocks, Statements, Rvalues, etc. to find the Constants again? Why is that better than doing it once and storing the result?

I believe re-iterating whether through another visitor or a for-loop is better than adding an extra attribute to the context because it doesn't have an invariant of location being present in the index operations and it wouldn't be keeping extra state. I think the check could happen in borrowck and in that case wouldn't be going through another pass, but that doesn't feel ideal because the pass isn't meant to perform propagation checks per se

@kkysen
Copy link
Contributor Author

kkysen commented Apr 21, 2023

an invariant of location being present in the index operations

What do you mean by this? That const_ref_locs's Locations are present in rvalue_tys? That seems like a pretty simple invariant. Or I might be misunderstanding.

@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from 5c1f112 to 012c46f Compare April 22, 2023 22:06
@kkysen kkysen force-pushed the kkysen/analyze-string-literals branch from bfde22e to 6a06083 Compare April 24, 2023 22:50
@kkysen kkysen merged commit b18d028 into master Apr 24, 2023
@kkysen kkysen deleted the kkysen/analyze-string-literals branch April 24, 2023 23:25
kkysen added a commit that referenced this pull request May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(c2rust-analyze) Handle string literals
3 participants