Skip to content

Difficulties with internally using refined types #7403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tlively opened this issue Mar 26, 2025 · 15 comments
Open

Difficulties with internally using refined types #7403

tlively opened this issue Mar 26, 2025 · 15 comments

Comments

@tlively
Copy link
Member

tlively commented Mar 26, 2025

We currently use or plan to use more refined types in the IR than we will eventually emit in the binary in a couple situations:

  • Function references always use concrete signature types rather than funcref even when GC is not enabled.
  • Strings use stringref rather than externref even when stringref is not enabled.
  • We plan to use exact heap types where possible even when custom descriptors are not enabled.

In each of these cases, we wish to use the more refined types internally because the extra type information can in principle help us optimize better. The binary writer generalizes the internal refined types to their most precise allowed supertype when writing the binary to ensure that our output only uses the allowed features.

However, type generalization in the binary writer is the source of many latent bugs that have not been found by the fuzzer or encountered by users.

  • Casts distinguishing values with refined types from their unrefined supertypes will no longer be able to distinguish these values after their target types have been generalized. Optimizations in OptimizeInstructions, RemoveUnusedBrs, Precompute, and GUFA based on the assumption that these casts fail are incorrect.
  • MinimizeRecGroups and TypeUpdater more generally depend on being able to distinguish rec group structures to ensure that separate types remain separate after optimizations. If the rec groups only differ because one uses a more refined type that will be generalized during binary writing, the types will no longer be separate in the final binary.

These bugs have not been found because they either depend on string lowering, which is not yet fuzzed, or depend on GC being enabled so that casts exist to be optimized, etc. Using exact types with GC modules and optimizations without enabling custom descriptors will surface these bugs.

The plan to fix these bugs is to update the utilities used by these optimizations to evaluate cast results and compare rec group structures to take the enabled features into account and apply the same generalization logic that will eventually be applied in the binary writer.

An alternative plan would be to not use refined types internally when that could lead to bugs. The relative merit of this approach will depend on how much extra optimizing power using the refined types ends up unlocking.

@kripken
Copy link
Member

kripken commented Mar 26, 2025

Thinking about this, is it not safe for strings, at least how we do the lifting?

For strings, if you start with imported JS strings, then lift to stringref, then we do not actually generate any casts to stringref - we do not convert externref to stringref anywhere (we only convert constants and import returns etc.). So casts seem safe. And, for rec groups, the lifting does not modify rec groups at all - only later optimizations would do so, but such optimizations should be safe anyhow, and not modify rec groups that should not be. Then the lowering later seems safe too.

Of course, if we lifted in a way that added new casts, or that modified rec groups, that would be dangerous.

Or am I missing something?

@tlively
Copy link
Member Author

tlively commented Mar 26, 2025

I'm not aware of StringLifting having any bugs, if that's what you're wondering. But if we do StringLifting + TypeRefining, for instance, we might run into the problems around differentiating rec group structures. Or if we do StringLifting + GUFA, we might introduce new casts to string that would be misoptimized by a subsequent OptimizeInstructions.

@kripken
Copy link
Member

kripken commented Mar 26, 2025

if we do StringLifting + TypeRefining, for instance, we might run into the problems around differentiating rec group structures.

But TypeRefining will only modify/refine rec groups that are ok to modify, i.e., private ones. If those later get merged that's fine?

Or if we do StringLifting + GUFA, we might introduce new casts to string that would be misoptimized by a subsequent OptimizeInstructions.

--gufa-casts does add new casts, but only safe ones, that always succeed. OptimizeInstructions might only remove them, if it sees they are redundant.

@tlively
Copy link
Member Author

tlively commented Mar 26, 2025

if we do StringLifting + TypeRefining, for instance, we might run into the problems around differentiating rec group structures.

But TypeRefining will only modify/refine rec groups that are ok to modify, i.e., private ones. If those later get merged that's fine?

But if the resulting private rec group ends up with the same structure as a public rec group, that's not good. This is very unlikely to happen in practice, but it's still a problem in principle.

@kripken
Copy link
Member

kripken commented Mar 26, 2025

But that can happen in any refinement of any type - we need to handle that by constantly making sure that private rec groups never turn into public ones, not just during lowering? (And in practice we emit a single big rec group for private ones, which has %0.000001 chance of overlap?)

@tlively
Copy link
Member Author

tlively commented Mar 26, 2025

Yes, exactly, and the problem is if the private and public rec groups differ only because one uses a string where another uses an extern, that difference will be erased during binary writing, so they never should have been considered different to begin with.

@kripken
Copy link
Member

kripken commented Mar 26, 2025

But how can you get to that situation? That is what I am saying is not possible.

Maybe an example can help. Say we start with

(rec $private
  (type $A (struct externref))
)
(rec $public
  (type $B (struct externref i32))
)

And say TypeRefining turns it into

(rec $private
  (type $A (struct stringref)) ;; this changed
)
(rec $public
  (type $B (struct externref i32)) ;; this is public so nothing can change
)

Now we lower it, and end up emitting the same as in the first code fragment, since stringref => externref. It is therefore equally at danger of running into collisions with other modules' rec groups as it was before.

And it was never at risk of colliding with the other internal rec group, since it was already different in the first code fragment.

What am I missing? Or can you give a concrete example of a bug?

@tlively
Copy link
Member Author

tlively commented Mar 26, 2025

The problem would occur for that example if GTO ran and removed the i32 field. You could construct similar examples that would use Unsubtyping, TypeMerging, or pretty much any other type optimization after StringLifting + TypeRefining to cause the collision.

@kripken
Copy link
Member

kripken commented Mar 26, 2025

But that would be a GTO bug, and one that can happen without string lifting and lowering? GTO doing that on the first code fragment (the input) would be the same bug.

I don't see the connection to using refined types internally, the topic of this issue.

@kripken
Copy link
Member

kripken commented Mar 26, 2025

Oh, wait, is your concern that GTO wouldn't be aware of the later lowering, so it has no way to avoid this bug? That makes sense. But that isn't GTO's problem - the lowering needs to keep distinct rec groups distinct.

@tlively
Copy link
Member Author

tlively commented Mar 26, 2025

Yes, this is a GTO bug, and a TypeMerging bug, and a bug with basically every type optimization because fundamentally it's a TypeUpdating bug. It's TypeUpdating that does not know that string and extern will be the same after binary writing. The fix will be in TypeUpdating.

(And similarly for the cast issues, the fix will be in just a couple places that evaluate casts.)

@tlively
Copy link
Member Author

tlively commented Mar 26, 2025

Ah, for strings we have a distinct lowering pass rather than doing the generalization in the binary writer, so you're right that you can view this as a bug in the lowering pass and the fix could be localized to the lowering pass. So yes, maybe this doesn't apply to strings the way I've been describing it. It will still apply to exact types, though. It would also be possible to get this kind of bug today with just reference types enabled if we allowed SignatureRefining to run without GC, for example.

@kripken
Copy link
Member

kripken commented Mar 26, 2025

Yes, this is a GTO bug, and a TypeMerging bug, and a bug with basically every type optimization

Ah, for strings we have a distinct lowering pass rather than doing the generalization in the binary writer, so you're right that you can view this as a bug in the lowering pass and the fix could be localized to the lowering pass.

And I think we can look at the others in the same way? That is, I suggest that we see this not as a bug in GTO, TypeMerging, and everything else - I think all those are perfect as they are right now - but that we just need more careful lowering.

Specifically, it is the responsibility of the lowering to keep distinct rec groups distinct. The lowering must be careful and add brands as needed to avoid conflicts. Yes, this means we can't do super-simple lowering in the binary writer as peephole operations - the lowering needs a more holistic view.

I feel that it is nicer/cleaner to put this responsibility on the lowering operation, rather than on general optimization infrastructure.

@tlively
Copy link
Member Author

tlively commented Mar 27, 2025

But we want to continue doing simple type generalization in the binary writer rather than introducing new lowering passes for those use cases. Besides the runtime cost, having separate passes wouldn't work because their output wouldn't be valid IR. For example it is not valid IR for a RefFunc to have type funcref rather than a specific signature type.

@kripken
Copy link
Member

kripken commented Mar 27, 2025

I agree there is a runtime cost, but I think the benefit would be that we keep all this complexity out of the main optimization infrastructure. GUFA etc. will not need to think about what later lowerings happen. I think that is a simpler model, which will make it easier to debug issues, and it might also be more efficient overall.

Good point that the IR would not validate under the stricter rules. We could relax validation, or we could just not validate at that point, that is, when binary writing starts it would run lowerings and then write, with no validation in the middle. Really, that lowering would be part of the binary writer, but it would still be the binary writer's responsibility to lower in a way that does not merge rec groups badly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants