-
Notifications
You must be signed in to change notification settings - Fork 140
[CIR][CUDA] Emit address space casts on lowering #1518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…n to match OG" Seems like windows bots are now broken! This reverts commit 9a63c50.
llvm#1180) This removes some NYI in CIRGenModule::tryEmitBaseDestructorAsAlias and similar to llvm#1179, use `assert(false)` to tell devs to add test. It is slightly verbose due to the difference between LLVM and CIR's type system. LLVM's pointer are opaque types while CIR's pointer are typed. So we need to handle these pointers when transforming the generated cir.
…lvm#1186) This PR adds a new command line option `--target` to our tool `cir-translate`. The concrete behaviour of it also depends on the triple and data layout in the CIR module. See the table in code comments for details. The default triple is `x86_64-unknown-linux-gnu` currently. Some tests are updated with triple and DLTI attribute eliminated (replaced by an option in RUN line). But still some tests remain unchanged, primarily because they use `cir-opt` instead.
…nAttr (llvm#1199) CIR PoisonOp is needed in this context as alternative would be to use VecCreateOp to prepare an arg for VecInsertElement, but VecCreate is for different purpose and [it would insert all elements](https://github.com/llvm/clangir/blob/eacaabba76ebdbf87217fefaa77f92c45cf4509c/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp#L1679) which is not totally unnecessary in this context. Here is the [intrinsic def ](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vqmovns_)
This would facilitate implementation of neon intrinsic `neon_vmax_v` and `__builtin_elementwise_max`, and potentially future optimizations. CIR BinOp supports vector type. Floating point has already been supported by FMaxOp.
This option creates and links all tools against a single libLLVM shared library (and the corresponding CLANG_LINK_CLANG_DYLIB option also gets turned on by default). In order for this to work, we need to use LINK_COMPONENTS instead of LINK_LIBS for all LLVM dependencies, and clang_target_link_libraries for all Clang dependencies, so that they get rewritten to use the dylib. Remove llvm_update_compile_flags while I'm here, since the build macros handle that internally. Before this change, we'd link against certain LLVM libraries both statically and dynamically, leading to test failures from duplicate singletons. The way this works for MLIR is fragile right now: MLIR can create its own dylib as well but doesn't have build system support for linking against that dylib. We end up folding the MLIR libraries into libclang-cpp.so (because all Clang dependencies get pulled into it), but MLIR tools still link the MLIR libraries statically. It'll still work, but BUILD_SHARED_LIBS is possibly a better alternative for development. Distributions like Fedora build their LLVM packages with LLVM_LINK_LLVM_DYLIB, so we'll want to eventually have good MLIR support for that setup too.
…lvm#1203) C/C++ functions returning void had an explicit !cir.void return type while not having any returned value, which was breaking a lot of MLIR invariants when the CIR dialect is used in a greater context, for example with the inliner. Now, a C/C++ function returning void has not return type and no return values, which does not break the MLIR invariant about the same number of return types and returned values. This change keeps the same parsing/pretty-printed syntax as before for compatibility.
…llvm#1230) References: [OG's implementation](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L13509) [Builtin's definition ](https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=vrsra_n_)
Combined implementaiton with `neon_vaddlvq_u16` OG somehow implemented them separately but they are no different except signess and intrinsic name [OG's neon_vaddlvq_s16](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L13483) [OG's neon_vaddlvq_u16](https://github.com/llvm/clangir/blob/2b1a638ea07ca10c5727ea835bfbe17b881175cc/clang/lib/CodeGen/CGBuiltin.cpp#L13449)
The module-level uwtable attribute controls the unwind tables for any synthesized functions, and the function-level attribute controls them for those functions. I'll add support for this attribute to the LLVM dialect as well, but translate it from CIR directly for now to avoid waiting on the MLIR addition and a subsequent rebase.
llvm#1232) Match `CodeGenModule::SetLLVMFunctionAttributesForDefinition` so that we can see what's missing and have a good base to build upon.
I am working on a clangir based solution to improve C++'s safety (https://discourse.llvm.org/t/rfc-a-clangir-based-safe-c/83245). This is similar with the previous analysis only approach I proposed, where we may not care about the lowered code. And this is what I described as layering problems in https://discourse.llvm.org/t/rfc-a-clangir-based-safe-c/83245 This is similar with the other issue proposed llvm#1128. We'd better to emit the higher level operations and lowering/optimizing it later. This is also inspired our handling method for VAArg, where we use ABI information to lower things during the passes. It gives me more confidence that I am doing things right.
The PR should help us to get rid of NYI `NYI UNREACHABLE executed at clang/lib/CIR/CodeGen/CIRGenExprAgg.cpp:899` [Relevant OG code here](https://github.com/llvm/clangir/blob/7fb608d4d1b72c25a1739a1bd66c9024208819cb/clang/lib/CodeGen/CGExpr.cpp#L4767): I put `HasExplicitObjectParameter` support as a missing feature, which is a new C++23 feature.
…ge (llvm#1214) #### The Problem Let's take a look at the following code: ``` struct A { ~A() {} }; int foo() { return 42; } void bar() { A a; int b = foo(); } ``` The call to `foo` guarded by the synthetic `tryOp` looks approximately like the following: ``` cir.try synthetic cleanup { %2 = cir.call exception @_Z3foov() : () -> !s32i cleanup { cir.call @_ZN1AD1Ev(%0) : (!cir.ptr<!ty_A>) -> () extra(#fn_attr1) // call to destructor of 'A' cir.yield } cir.yield } catch [#cir.unwind { cir.resume }] cir.store %2, %1: !s32i, !cir.ptr<!s32i> // CIR verification error ``` The result of the `foo` call is in the `try` region - and is not accessible from the outside, so the code generation fails with `operand #0 does not dominate its use` . #### Solution So we have several options how to handle this properly. 1. We may intpoduce a new operation here, like `TryCall` but probably more high level one, e.g. introduce the `InvokeOp`. 2. Also, we may add the result to `TryOp`. 3. The fast fix that is implemented in this PR is a temporary `alloca` where we store the call result right in the try region. And the result of the whole `emitCall` is a `load` from the temp `alloca`. So this PR is both the request for changes and an open discussion as well - how to handle this properly. So far I choose the third approach. If it's ok - I will need to create one more PR with a similar fix for the aggregated results or update this one.
This PR puts for-loop body, while-loop body, and do-while-loop body in nested scopes. Allocas in the loop body are now push down to the nested scope. Resolve llvm#1218 .
There are two sets of intrinsics regarding Min and Max operations for floating points [Maximum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrmaximum-llvmmaximumop) vs [Maxnum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrmaxnum-llvmmaxnumop) [Minimum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrminimum-llvmminimumop) vs [Minnum](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrminnum-llvmminnumop) [The difference is whether NaN should be propagated when one of the inputs is NaN](https://llvm.org/docs/LangRef.html#llvm-maximumnum-intrinsic) Maxnum and Minnum would return number if one of inputs is NaN, and the other is a number, But Maximum and Minimum would return NaN (propagation of NaN) And they are resolved to different ASM such as [FMAX](https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMAX--vector---Floating-point-Maximum--vector--?lang=en) vs [FMAXNM](https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/FMAXNM--vector---Floating-point-Maximum-Number--vector--?lang=en) Both have user cases, we already implemented Maxnum and Minnum But Maximum and Minimum has user cases in [neon intrinsic ](https://developer.arm.com/architectures/instruction-sets/intrinsics/vmax_f32 ) and [__builtin_elementwise_maximum ](https://github.com/llvm/clangir/blob/a989ecb2c55da1fe28e4072c31af025cba6c4f0f/clang/test/CodeGen/strictfp-elementwise-bulitins.cpp#L53)
…lvm#1235) Use iterator to visit std::initializer_list field reduce the readability
…d neon_vaddvq_f64 (llvm#1238) [Neon intrinsic definition](https://developer.arm.com/architectures/instruction-sets/intrinsics/vaddv_f32). They are vector across operation which LLVM doesn't currently have a generic intrinsic about it. As a side note for brainstorm, it might be worth in the future for CIR to introduce Vector Across type operations even though LLVM dialect doesn't have it yet. This would help to expose opt opportunities. E.g. a very trivial constant fold can happen if we are adding across a constant vector.
…#1239) This implementation is different from OG in the sense we chose to use CIR op which eventually lowers to generic LLVM intrinsics instead of llvm.aarch64.neon intrinsics But down to the ASM level, [they are identical ](https://godbolt.org/z/Gbbos9z6Y).
…vm#1242) This patch follows llvm#1220 (comment) by augmenting `CIR_Type` with a new field, `tbaaName`. Specifically, it enables TableGen support for the `-gen-cir-tbaa-name-lowering` option, allowing for the generation of `getTBAAName` functions based on the `tbaaName`. This enhancement enables us to replace the hardcoded TBAA names in the `getTypeName` function with the newly generated `getTBAAName`.
This PR adds a bitcast when we rewrite globals type. Previously we just set a new type and it worked. But recently I started to test ClangIR with CSmith in order to find some run time bugs and faced with the next problem. ``` typedef struct { int x : 15; uint8_t y; } S; S g = { -12, 254}; int main() { printf("%d\n", g.y); return 0; } ``` The output for this program is ... 127 but not 254! The reason is that first global var is created with the type of struct `S`, then `get_member` operation is generated with index `1` and then after, the type of the global is rewritten - I assume because of the anon struct created on the right side in the initialization. But the `get_member` operation still wants to access to the field at index `1` and get a wrong byte. If we change the `y` type to `int` we will fail on the verification stage. But in the example above it's a run time error! This is why I suggest to add a bitcast once we have to rewrite the global type.
…2/u32, (llvm#1240) Co-authored-by: Bruno Cardoso Lopes <[email protected]>
We figure it would be nice to have a common place with all our known crashes that is tracked by git and is actively verified whether or not we can now support the crashes by lit. It can act as our source of truth for known failures and also being potential good first tasks for new developers. Add a simple test case of a known crash that involves copying a struct in a catch. Reviewers: smeenai, bcardosolopes Reviewed By: bcardosolopes Pull Request: llvm#1243
…#1247) Basically that is - the return value for `=` operator for bitfield assignment is wrong now. For example, the next function returns `7` for 3 bit bit field, though it should be `-1`: ``` int get_a(T *t) { return (t->a = 7); } ``` This PR fix it. Actually, the bug was in the lowering - the integer cast is applied in the wrong place (in comparison with the original codegen).
Lower vcalts_f32
Sub-issue of llvm#1192. Adds CIR_ATanOp and support for __builtin_elementwise_atan.
Part of llvm#258 . 1. Added `AddressPointAttr` 2. Change all occurrences of `VTableAddrPointOp` into using the attribute 3. Update tests --------- Co-authored-by: Sirui Mu <[email protected]> Co-authored-by: Morris Hafner <[email protected]> Co-authored-by: Morris Hafner <[email protected]> Co-authored-by: Sharp-Edged <[email protected]> Co-authored-by: Amr Hesham <[email protected]> Co-authored-by: Bruno Cardoso Lopes <[email protected]> Co-authored-by: Letu Ren <[email protected]>
Clang relies on `llvm::Intrinsic::getOrInsertDeclaration` to handle functions marked as `ClangBuiltin` in TableGen. That function receives a `CodeGenModule*` so CIR can't use that. We need to re-implement parts of it.
Closes llvm#1367 --------- Co-authored-by: Sirui Mu <[email protected]> Co-authored-by: Amr Hesham <[email protected]> Co-authored-by: Chibuoyim (Wilson) Ogbonna <[email protected]> Co-authored-by: Yue Huang <[email protected]> Co-authored-by: Morris Hafner <[email protected]> Co-authored-by: Morris Hafner <[email protected]> Co-authored-by: Sharp-Edged <[email protected]> Co-authored-by: Bruno Cardoso Lopes <[email protected]> Co-authored-by: Letu Ren <[email protected]>
…ity and maintainability (llvm#1525) As noted in [this comment](llvm#1442 (comment)), the nested if-arms in `GlobalOpLowering` are somewhat confusing and error-prone. This PR simplifies the logic into more straightforward components. Since LLVM's GlobalOp accepts two types of initializers (either an initializer value or an initializer region), we've extracted the decision logic into a separate function called `lowerInitializer`. This function takes two inout arguments: `mlir::Attribute &init` (for the attribute value) and `bool useInitializerRegion` (as the decision indicator). All code paths then converge at a common epilogue that handles the operation rewriting. The previous implementation for lowering `DataMemberAttr` initializers relied on recursion between MLIR rewrite calls, which made the control flow somewhat opaque. The new version makes this explicit by using a clear self-recursive pattern within `lowerInitializer`.
Fix llvm#1371. Not sure about whether we could remove `convertTypeForMem` completely. Let's fix the doc first.
This change moves all declarations of emit* functions in CIRGenFunction.h into a common location and sorts them alphabetically. The goal of this change is to make it easier to keep upstream and incubator code in a consistent location, making functions easier to find for upstreaming and minimizing conflicts in the incubator when rebasing. I did most of this sort manually, and I've probably been inconsistent in how I treat sorting of uppercase versus lowercase. I made no attempt to provide a rule for ordering different declarations of functions with the same name. We can improve on that over time if anyone feels the need. I tried very hard not to drop comments (one of the reasons I had to do this manually), but I may have lost a few. This change loses the grouping of some declarations that were co-located by common purpose, but most of the declarations lacked a coherent ordering, so I think this is a step forward overall.
This is a rebased version of the inactive PR llvm#1380. --------- Co-authored-by: koparasy <[email protected]>
Adds implementation for ATanOp's lowering ThroughMLIR.
Lower `vabds_f32` and `vabdd_f64`
d6f21e5
to
4888f55
Compare
Adds implementation for ACosOp's lowering ThroughMLIR.
Lower vmaxv_s32
@@ -470,7 +470,7 @@ CIRGenModule::getOrCreateStaticVarDecl(const VarDecl &D, | |||
Name = getStaticDeclName(*this, D); | |||
|
|||
mlir::Type LTy = getTypes().convertTypeForMem(Ty); | |||
cir::AddressSpaceAttr AS = | |||
cir::AddressSpaceAttr actualAS = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment on what you mean by "actual"
auto definingOp = addr.getDefiningOp(); | ||
bool hasCast = isa<cir::CastOp>(definingOp); | ||
auto getAddrOp = mlir::cast<cir::GetGlobalOp>( | ||
hasCast ? definingOp->getOperand(0).getDefiningOp() : definingOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing the point here: why you need to unwrap a cast that was wrapped by createGetGlobal
? If you need to unwrap it, why do you need to generate it in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need both the cast and the get_global
op for examination below. We will store addr
in the LocalDeclMap
, with potential bitcasts. Previously a bitcast is emitted whenever the global variable type and the initializer type don't match (for example int*
and int[]
), but they can mismatch because of different AS but the same underlying type (say int* addrspace(1)
and int*
). So I think we must emit a cast to check which case we're in.
auto oldTy = mlir::cast<cir::PointerType>(getGlobal.getType()); | ||
auto newTy = | ||
cir::PointerType::get(oldTy.getPointee(), /*addrspace=*/gpuAS); | ||
auto cast = createAddrSpaceCast(loc, getGlobal, newTy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of emitting all this casts, have you considered changing the address space of the global for good instead of playing with two different ones? Are there uses that actually try to retrieve two different ASs out of the same global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the address space of the global, then it will be placed in the wrong section. For __device__
(AS 1) as an example, OG does
; Global
@var: ptr addrspace(1)
; Use a global
... (addrspacecast ptr addrspace(1) @var to ptr)
Adds implementation for ASinOp's lowering ThroughMLIR.
Lower vmaxv_u32
Lower vmaxvq_f64
Lower vmaxnmvq f32 and f64
Sorry for the delay here, I still have a few thing I want to double check here before I consider this ready, should put more time on it beginning of next week. Thanks for your patience |
This is different from the approach in OG, because we use
cir.get_global
to access global variables, while in OG we just directly write the name of the variable. My approach is to add another attribute toGlobalOp
s and emit additional casts in LoweringPrepare.I hope the comments are clear enough about the changes, and please tell me if you find a better approach.