Difficulties with internally using refined types #7403

tlively · 2025-03-26T03:33:58Z

We currently use or plan to use more refined types in the IR than we will eventually emit in the binary in a couple situations:

Function references always use concrete signature types rather than funcref even when GC is not enabled.
Strings use stringref rather than externref even when stringref is not enabled.
We plan to use exact heap types where possible even when custom descriptors are not enabled.

In each of these cases, we wish to use the more refined types internally because the extra type information can in principle help us optimize better. The binary writer generalizes the internal refined types to their most precise allowed supertype when writing the binary to ensure that our output only uses the allowed features.

However, type generalization in the binary writer is the source of many latent bugs that have not been found by the fuzzer or encountered by users.

Casts distinguishing values with refined types from their unrefined supertypes will no longer be able to distinguish these values after their target types have been generalized. Optimizations in OptimizeInstructions, RemoveUnusedBrs, Precompute, and GUFA based on the assumption that these casts fail are incorrect.
MinimizeRecGroups and TypeUpdater more generally depend on being able to distinguish rec group structures to ensure that separate types remain separate after optimizations. If the rec groups only differ because one uses a more refined type that will be generalized during binary writing, the types will no longer be separate in the final binary.

These bugs have not been found because they either depend on string lowering, which is not yet fuzzed, or depend on GC being enabled so that casts exist to be optimized, etc. Using exact types with GC modules and optimizations without enabling custom descriptors will surface these bugs.

The plan to fix these bugs is to update the utilities used by these optimizations to evaluate cast results and compare rec group structures to take the enabled features into account and apply the same generalization logic that will eventually be applied in the binary writer.

An alternative plan would be to not use refined types internally when that could lead to bugs. The relative merit of this approach will depend on how much extra optimizing power using the refined types ends up unlocking.

The text was updated successfully, but these errors were encountered:

kripken · 2025-03-26T17:59:57Z

Thinking about this, is it not safe for strings, at least how we do the lifting?

For strings, if you start with imported JS strings, then lift to stringref, then we do not actually generate any casts to stringref - we do not convert externref to stringref anywhere (we only convert constants and import returns etc.). So casts seem safe. And, for rec groups, the lifting does not modify rec groups at all - only later optimizations would do so, but such optimizations should be safe anyhow, and not modify rec groups that should not be. Then the lowering later seems safe too.

Of course, if we lifted in a way that added new casts, or that modified rec groups, that would be dangerous.

Or am I missing something?

tlively · 2025-03-26T18:11:07Z

I'm not aware of StringLifting having any bugs, if that's what you're wondering. But if we do StringLifting + TypeRefining, for instance, we might run into the problems around differentiating rec group structures. Or if we do StringLifting + GUFA, we might introduce new casts to string that would be misoptimized by a subsequent OptimizeInstructions.

kripken · 2025-03-26T18:53:35Z

if we do StringLifting + TypeRefining, for instance, we might run into the problems around differentiating rec group structures.

But TypeRefining will only modify/refine rec groups that are ok to modify, i.e., private ones. If those later get merged that's fine?

Or if we do StringLifting + GUFA, we might introduce new casts to string that would be misoptimized by a subsequent OptimizeInstructions.

--gufa-casts does add new casts, but only safe ones, that always succeed. OptimizeInstructions might only remove them, if it sees they are redundant.

tlively · 2025-03-26T19:03:39Z

if we do StringLifting + TypeRefining, for instance, we might run into the problems around differentiating rec group structures.

But TypeRefining will only modify/refine rec groups that are ok to modify, i.e., private ones. If those later get merged that's fine?

But if the resulting private rec group ends up with the same structure as a public rec group, that's not good. This is very unlikely to happen in practice, but it's still a problem in principle.

kripken · 2025-03-26T19:07:22Z

But that can happen in any refinement of any type - we need to handle that by constantly making sure that private rec groups never turn into public ones, not just during lowering? (And in practice we emit a single big rec group for private ones, which has %0.000001 chance of overlap?)

tlively · 2025-03-26T20:18:32Z

Yes, exactly, and the problem is if the private and public rec groups differ only because one uses a string where another uses an extern, that difference will be erased during binary writing, so they never should have been considered different to begin with.

kripken · 2025-03-26T20:30:05Z

But how can you get to that situation? That is what I am saying is not possible.

Maybe an example can help. Say we start with

(rec $private
  (type $A (struct externref))
)
(rec $public
  (type $B (struct externref i32))
)

And say TypeRefining turns it into

(rec $private
  (type $A (struct stringref)) ;; this changed
)
(rec $public
  (type $B (struct externref i32)) ;; this is public so nothing can change
)

Now we lower it, and end up emitting the same as in the first code fragment, since stringref => externref. It is therefore equally at danger of running into collisions with other modules' rec groups as it was before.

And it was never at risk of colliding with the other internal rec group, since it was already different in the first code fragment.

What am I missing? Or can you give a concrete example of a bug?

tlively · 2025-03-26T20:45:11Z

The problem would occur for that example if GTO ran and removed the i32 field. You could construct similar examples that would use Unsubtyping, TypeMerging, or pretty much any other type optimization after StringLifting + TypeRefining to cause the collision.

kripken · 2025-03-26T21:14:13Z

But that would be a GTO bug, and one that can happen without string lifting and lowering? GTO doing that on the first code fragment (the input) would be the same bug.

I don't see the connection to using refined types internally, the topic of this issue.

kripken · 2025-03-26T21:15:28Z

Oh, wait, is your concern that GTO wouldn't be aware of the later lowering, so it has no way to avoid this bug? That makes sense. But that isn't GTO's problem - the lowering needs to keep distinct rec groups distinct.

tlively · 2025-03-26T22:40:44Z

Yes, this is a GTO bug, and a TypeMerging bug, and a bug with basically every type optimization because fundamentally it's a TypeUpdating bug. It's TypeUpdating that does not know that string and extern will be the same after binary writing. The fix will be in TypeUpdating.

(And similarly for the cast issues, the fix will be in just a couple places that evaluate casts.)

tlively · 2025-03-26T22:46:20Z

Ah, for strings we have a distinct lowering pass rather than doing the generalization in the binary writer, so you're right that you can view this as a bug in the lowering pass and the fix could be localized to the lowering pass. So yes, maybe this doesn't apply to strings the way I've been describing it. It will still apply to exact types, though. It would also be possible to get this kind of bug today with just reference types enabled if we allowed SignatureRefining to run without GC, for example.

kripken · 2025-03-26T23:56:19Z

Yes, this is a GTO bug, and a TypeMerging bug, and a bug with basically every type optimization

Ah, for strings we have a distinct lowering pass rather than doing the generalization in the binary writer, so you're right that you can view this as a bug in the lowering pass and the fix could be localized to the lowering pass.

And I think we can look at the others in the same way? That is, I suggest that we see this not as a bug in GTO, TypeMerging, and everything else - I think all those are perfect as they are right now - but that we just need more careful lowering.

Specifically, it is the responsibility of the lowering to keep distinct rec groups distinct. The lowering must be careful and add brands as needed to avoid conflicts. Yes, this means we can't do super-simple lowering in the binary writer as peephole operations - the lowering needs a more holistic view.

I feel that it is nicer/cleaner to put this responsibility on the lowering operation, rather than on general optimization infrastructure.

tlively · 2025-03-27T00:04:08Z

But we want to continue doing simple type generalization in the binary writer rather than introducing new lowering passes for those use cases. Besides the runtime cost, having separate passes wouldn't work because their output wouldn't be valid IR. For example it is not valid IR for a RefFunc to have type funcref rather than a specific signature type.

kripken · 2025-03-27T16:24:51Z

I agree there is a runtime cost, but I think the benefit would be that we keep all this complexity out of the main optimization infrastructure. GUFA etc. will not need to think about what later lowerings happen. I think that is a simpler model, which will make it easier to debug issues, and it might also be more efficient overall.

Good point that the IR would not validate under the stricter rules. We could relax validation, or we could just not validate at that point, that is, when binary writing starts it would run lowerings and then write, with no validation in the middle. Really, that lowering would be part of the binary writer, but it would still be the binary writer's responsibility to lower in a way that does not merge rec groups badly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficulties with internally using refined types #7403

Difficulties with internally using refined types #7403

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 27, 2025

kripken commented Mar 27, 2025

Difficulties with internally using refined types #7403

Difficulties with internally using refined types #7403

Comments

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 26, 2025

tlively commented Mar 26, 2025

kripken commented Mar 26, 2025

tlively commented Mar 27, 2025

kripken commented Mar 27, 2025