Skip to content

[CIR][CodeGen] Introduce bytes offset attribute #1584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

gitoleg
Copy link
Collaborator

@gitoleg gitoleg commented Apr 23, 2025

This PR introduces a new attribute for the explicit byte offset for GlobalViewAttr. It's a proposal, so feel free to reject it once it's not good from your point of view.

The problem is (as usually) with globals and unions: looks like we can not really use indexes in the GlobalView to address an array of unions. For example, the next program prints 4 now, but it should be 42:

typedef struct {
    long s0;
    int  s1;
} S;

typedef union {
   int  f0;
   S f1;
} U;

static U g1[3] = {{42},{42},{42}};
int* g2 = &g1[1].f1.s1;

int main() {       
   (*g2) = 4;
   printf("%d\n", g1[1].f0);    
   return0;
}

The problem is that we compute wrong indices in CIRGenBuilder::computeGlobalViewIndicesFromFlatOffset. Maybe it can be even fixed in this case, but I have a feeling that the fix would be a bit fragile.

So instead of trying to support indexes for the array of unions I suggest to use the offset explicitly.

From the implementation point of view there are some changes in CIRGenBuilder - but nothing really new is in there - I just did not want to introduce copy-pasta for the isOffsetInUnion function that is pretty the same as former computeGlobalViewIndicesFromFlatOffset.


static U2 g1[3] = {{0x42},{0x42},{0x42}};
int* g2 = &g1[1].f1.s1;
// CIR: cir.global external @g2 = #cir.global_view<@g1, offset 24> : !cir.ptr<!s32i>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test plain LLVM as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to show how the original LLVM looks like

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OG is fine, just asking for the extra one too

// ByteOffsetAttr
//===----------------------------------------------------------------------===//

def ByteOffsetAttr : CIR_Attr<"ByteOffset", "byte_offset"> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding ByteOffsetAttr, shouldn't we just use mlir::IntegerAttr directly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can! I just thought it would be more explicit

An integer attribute is a literal attribute that represents an integral
value of the specified integer type.
}];
let parameters = (ins "uint64_t":$value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the offset somehow be negative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can not be negative

@@ -772,21 +789,44 @@ def GlobalViewAttr : CIR_Attr<"GlobalView", "global_view", [TypedAttrInterface]>

let parameters = (ins AttributeSelfTypeParameter<"">:$type,
"mlir::FlatSymbolRefAttr":$symbol,
OptionalParameter<"mlir::ArrayAttr">:$indices);
OptionalParameter<"mlir::ArrayAttr">:$indices,
OptionalParameter<"ByteOffsetAttr">:$offset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can both indices and offset be used at the same time? Having two options is confusing, how would they relate to each other or how a user would pick up the right one to use?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now that in the verifier you blocked this case, can you add some doc here to explain when is one supposed to be used versus the other? That might help me think more about the design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also add that it might be better to (a) have only one optional attribute of the Attribute type, (b) make the verifier distinguish if they come from valid attributes with isa<>, (c) explain and give examples in the doc, and (d) add builder helpers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that's why I added a verifier for the GlobalViewAttr and check that they can't be used together.
The question about a user is a good one. I would say that indices should be used by default, and byte offset in the very rare cases like this one. But in the same time it implies that careful user should check both every time. So may be it's not a good idea after all, what do you think?

@gitoleg
Copy link
Collaborator Author

gitoleg commented Apr 24, 2025

I think I have a better solution and just update the index computation - so once you are good with it - I can rename the PR.
The only downside of this approach and it could be a chance that indices will be computed wrong:

typedef struct {
   double x;
   int y;
} S1;

typedef struct {
     long x;
     int y;
} S2; 

typedef union {
   S1 s1;
   S2 s2;
} U;

static U g1[2] = {{42},{42}};
int* g2 = &g1[1].s1.x;
int* g3 = &g1[1].s2.x;

Both of g2 and g3 will have the same indexes. But the point is - both of them will have the same offset on LLVM IR side, which is more important since it's correct and even with the original LLVM with CIR involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants