Skip to content

update llir/llvm to support 11.0 #147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 of 7 tasks
dannypsnl opened this issue Jul 31, 2020 · 14 comments
Closed
6 of 7 tasks

update llir/llvm to support 11.0 #147

dannypsnl opened this issue Jul 31, 2020 · 14 comments

Comments

@dannypsnl
Copy link
Member

dannypsnl commented Jul 31, 2020

Current Status

11.0 already released, reference: https://releases.llvm.org/download.html#11.0.0

Changes

Below from https://releases.llvm.org/11.0.0/docs/ReleaseNotes.html#id4

  • The callsite attribute vector-function-abi-variant has been added to describe the mapping between scalar functions and vector functions, to enable vectorization of call sites. The information provided by the attribute is interfaced via the API provided by the VFDatabase class. When scanning through the set of vector functions associated with a scalar call, the loop vectorizer now relies on VFDatabase, instead of TargetLibraryInfo.

  • dereferenceable attributes and metadata on pointers no longer imply anything about the alignment of the pointer in question. Previously, some optimizations would make assumptions based on the type of the pointer. This behavior was undocumented. To preserve optimizations, frontends may need to be updated to generate appropriate align attributes and metadata.

  • The DIModule metadata is extended to contain file and line number information. This information is used to represent Fortran modules debug info at IR level.

  • LLVM IR now supports two distinct llvm::FixedVectorType and llvm::ScalableVectorType vector types, both derived from the base class llvm::VectorType. A number of algorithms dealing with IR vector types have been updated to make sure they work for both scalable and fixed vector types. Where possible, the code has been made generic to cover both cases using the base class. Specifically, places that were using the type unsigned to count the number of lanes of a vector are now using llvm::ElementCount. In places where uint64_t was used to denote the size in bits of a IR type we have partially migrated the codebase to using llvm::TypeSize.

  • Branching on undef/poison is undefined behavior. It is needed for correctly analyzing value ranges based on branch conditions. This is consistent with MSan’s behavior as well.

  • memset/memcpy/memmove can take undef/poison pointer(s) if the size to fill is zero.

  • Passing undef/poison to a standard I/O library function call (printf/fputc/…) is undefined behavior. The new noundef attribute is attached to the functions’ arguments. The full list is available at llvm::inferLibFuncAttributes.

@mewmew
Copy link
Member

mewmew commented Jul 31, 2020

11.0 almost there. Just take a note, waiting for release.

That's great! Thanks for making the issue to track the 11.0 release :)

@dannypsnl

This comment has been minimized.

@mewmew

This comment has been minimized.

@dannypsnl

This comment has been minimized.

@dannypsnl
Copy link
Member Author

dannypsnl commented Oct 13, 2020

Compare ASM parser changes:

$ wget https://github.com/llvm/llvm-project/archive/llvmorg-10.0.0.tar.gz
$ wget https://github.com/llvm/llvm-project/archive/llvmorg-11.0.0.tar.gz
$ tar zxf llvmorg-10.0.0.tar.gz
$ tar zxf llvmorg-11.0.0.tar.gz
$ git diff llvm-project-llvmorg-10.0.0/llvm/lib/AsmParser llvm-project-llvmorg-11.0.0/llvm/lib/AsmParser
diff --git a/llvm-project-llvmorg-10.0.0/llvm/lib/AsmParser/LLParser.cpp b/llvm-project-llvmorg-11.0.0/llvm/lib/AsmParser/LLParser.cpp
index 1a17f63..c9f21ee 100644
--- a/llvm-project-llvmorg-10.0.0/llvm/lib/AsmParser/LLParser.cpp
+++ b/llvm-project-llvmorg-11.0.0/llvm/lib/AsmParser/LLParser.cpp
@@ -6937,7 +7055,12 @@ int LLParser::ParseAlloc(Instruction *&Inst, PerFunctionState &PFS) {
   if (Size && !Size->getType()->isIntegerTy())
     return Error(SizeLoc, "element count must have integer type");
 
-  AllocaInst *AI = new AllocaInst(Ty, AddrSpace, Size, Alignment);
+  SmallPtrSet<Type *, 4> Visited;
+  if (!Alignment && !Ty->isSized(&Visited))
+    return Error(TyLoc, "Cannot allocate unsized type");
+  if (!Alignment)
+    Alignment = M->getDataLayout().getPrefTypeAlign(Ty);
+  AllocaInst *AI = new AllocaInst(Ty, AddrSpace, Size, *Alignment);
   AI->setUsedWithInAlloca(IsInAlloca);
   AI->setSwiftError(IsSwiftError);
   Inst = AI;
@@ -6987,8 +7110,12 @@ int LLParser::ParseLoad(Instruction *&Inst, PerFunctionState &PFS) {
   if (Ty != cast<PointerType>(Val->getType())->getElementType())
     return Error(ExplicitTypeLoc,
                  "explicit pointee type doesn't match operand's pointee type");
-
-  Inst = new LoadInst(Ty, Val, "", isVolatile, Alignment, Ordering, SSID);
+  SmallPtrSet<Type *, 4> Visited;
+  if (!Alignment && !Ty->isSized(&Visited))
+    return Error(ExplicitTypeLoc, "loading unsized types is not allowed");
+  if (!Alignment)
+    Alignment = M->getDataLayout().getABITypeAlign(Ty);
+  Inst = new LoadInst(Ty, Val, "", isVolatile, *Alignment, Ordering, SSID);
   return AteExtraComma ? InstExtraComma : InstNormal;
 }
 
@@ -7034,8 +7161,13 @@ int LLParser::ParseStore(Instruction *&Inst, PerFunctionState &PFS) {
   if (Ordering == AtomicOrdering::Acquire ||
       Ordering == AtomicOrdering::AcquireRelease)
     return Error(Loc, "atomic store cannot use Acquire ordering");
+  SmallPtrSet<Type *, 4> Visited;
+  if (!Alignment && !Val->getType()->isSized(&Visited))
+    return Error(Loc, "storing unsized types is not allowed");
+  if (!Alignment)
+    Alignment = M->getDataLayout().getABITypeAlign(Val->getType());
 
-  Inst = new StoreInst(Val, Ptr, isVolatile, Alignment, Ordering, SSID);
+  Inst = new StoreInst(Val, Ptr, isVolatile, *Alignment, Ordering, SSID);
   return AteExtraComma ? InstExtraComma : InstNormal;
 }
 
@@ -7084,8 +7216,13 @@ int LLParser::ParseCmpXchg(Instruction *&Inst, PerFunctionState &PFS) {
     return Error(NewLoc, "new value and pointer type do not match");
   if (!New->getType()->isFirstClassType())
     return Error(NewLoc, "cmpxchg operand must be a first class value");
+
+  Align Alignment(
+      PFS.getFunction().getParent()->getDataLayout().getTypeStoreSize(
+          Cmp->getType()));
+
   AtomicCmpXchgInst *CXI = new AtomicCmpXchgInst(
-      Ptr, Cmp, New, SuccessOrdering, FailureOrdering, SSID);
+      Ptr, Cmp, New, Alignment, SuccessOrdering, FailureOrdering, SSID);
   CXI->setVolatile(isVolatile);
   CXI->setWeak(isWeak);
   Inst = CXI;
@@ -7169,9 +7306,11 @@ int LLParser::ParseAtomicRMW(Instruction *&Inst, PerFunctionState &PFS) {
   if (Size < 8 || (Size & (Size - 1)))
     return Error(ValLoc, "atomicrmw operand must be power-of-two byte-sized"
                          " integer");
-
+  Align Alignment(
+      PFS.getFunction().getParent()->getDataLayout().getTypeStoreSize(
+          Val->getType()));
   AtomicRMWInst *RMWI =
-    new AtomicRMWInst(Operation, Ptr, Val, Ordering, SSID);
+      new AtomicRMWInst(Operation, Ptr, Val, Alignment, Ordering, SSID);
   RMWI->setVolatile(isVolatile);
   Inst = RMWI;
   return AteExtraComma ? InstExtraComma : InstNormal;
@@ -8479,13 +8658,133 @@ bool LLParser::ParseOptionalVTableFuncs(VTableFuncList &VTableFuncs) {
   return false;
 }
 
+/// ParamNo := 'param' ':' UInt64
+bool LLParser::ParseParamNo(uint64_t &ParamNo) {
+  if (ParseToken(lltok::kw_param, "expected 'param' here") ||
+      ParseToken(lltok::colon, "expected ':' here") || ParseUInt64(ParamNo))
+    return true;
+  return false;
+}
+
+/// ParamAccessOffset := 'offset' ':' '[' APSINTVAL ',' APSINTVAL ']'
+bool LLParser::ParseParamAccessOffset(ConstantRange &Range) {
+  APSInt Lower;
+  APSInt Upper;
+  auto ParseAPSInt = [&](APSInt &Val) {
+    if (Lex.getKind() != lltok::APSInt)
+      return TokError("expected integer");
+    Val = Lex.getAPSIntVal();
+    Val = Val.extOrTrunc(FunctionSummary::ParamAccess::RangeWidth);
+    Val.setIsSigned(true);
+    Lex.Lex();
+    return false;
+  };
+  if (ParseToken(lltok::kw_offset, "expected 'offset' here") ||
+      ParseToken(lltok::colon, "expected ':' here") ||
+      ParseToken(lltok::lsquare, "expected '[' here") || ParseAPSInt(Lower) ||
+      ParseToken(lltok::comma, "expected ',' here") || ParseAPSInt(Upper) ||
+      ParseToken(lltok::rsquare, "expected ']' here"))
+    return true;
+
+  ++Upper;
+  Range =
+      (Lower == Upper && !Lower.isMaxValue())
+          ? ConstantRange::getEmpty(FunctionSummary::ParamAccess::RangeWidth)
+          : ConstantRange(Lower, Upper);
+
+  return false;
+}
+
+/// ParamAccessCall
+///   := '(' 'callee' ':' GVReference ',' ParamNo ',' ParamAccessOffset ')'
+bool LLParser::ParseParamAccessCall(FunctionSummary::ParamAccess::Call &Call) {
+  if (ParseToken(lltok::lparen, "expected '(' here") ||
+      ParseToken(lltok::kw_callee, "expected 'callee' here") ||
+      ParseToken(lltok::colon, "expected ':' here"))
+    return true;
+
+  unsigned GVId;
+  ValueInfo VI;
+  if (ParseGVReference(VI, GVId))
+    return true;
+
+  Call.Callee = VI.getGUID();
+
+  if (ParseToken(lltok::comma, "expected ',' here") ||
+      ParseParamNo(Call.ParamNo) ||
+      ParseToken(lltok::comma, "expected ',' here") ||
+      ParseParamAccessOffset(Call.Offsets))
+    return true;
+
+  if (ParseToken(lltok::rparen, "expected ')' here"))
+    return true;
+
+  return false;
+}
+
+/// ParamAccess
+///   := '(' ParamNo ',' ParamAccessOffset [',' OptionalParamAccessCalls]? ')'
+/// OptionalParamAccessCalls := '(' Call [',' Call]* ')'
+bool LLParser::ParseParamAccess(FunctionSummary::ParamAccess &Param) {
+  if (ParseToken(lltok::lparen, "expected '(' here") ||
+      ParseParamNo(Param.ParamNo) ||
+      ParseToken(lltok::comma, "expected ',' here") ||
+      ParseParamAccessOffset(Param.Use))
+    return true;
+
+  if (EatIfPresent(lltok::comma)) {
+    if (ParseToken(lltok::kw_calls, "expected 'calls' here") ||
+        ParseToken(lltok::colon, "expected ':' here") ||
+        ParseToken(lltok::lparen, "expected '(' here"))
+      return true;
+    do {
+      FunctionSummary::ParamAccess::Call Call;
+      if (ParseParamAccessCall(Call))
+        return true;
+      Param.Calls.push_back(Call);
+    } while (EatIfPresent(lltok::comma));
+
+    if (ParseToken(lltok::rparen, "expected ')' here"))
+      return true;
+  }
+
+  if (ParseToken(lltok::rparen, "expected ')' here"))
+    return true;
+
+  return false;
+}
+
+/// OptionalParamAccesses
+///   := 'params' ':' '(' ParamAccess [',' ParamAccess]* ')'
+bool LLParser::ParseOptionalParamAccesses(
+    std::vector<FunctionSummary::ParamAccess> &Params) {
+  assert(Lex.getKind() == lltok::kw_params);
+  Lex.Lex();
+
+  if (ParseToken(lltok::colon, "expected ':' here") ||
+      ParseToken(lltok::lparen, "expected '(' here"))
+    return true;
+
+  do {
+    FunctionSummary::ParamAccess ParamAccess;
+    if (ParseParamAccess(ParamAccess))
+      return true;
+    Params.push_back(ParamAccess);
+  } while (EatIfPresent(lltok::comma));
+
+  if (ParseToken(lltok::rparen, "expected ')' here"))
+    return true;
+
+  return false;
+}
+
 /// OptionalRefs
 ///   := 'refs' ':' '(' GVReference [',' GVReference]* ')'
 bool LLParser::ParseOptionalRefs(std::vector<ValueInfo> &Refs) {
   assert(Lex.getKind() == lltok::kw_refs);
   Lex.Lex();
 
-  if (ParseToken(lltok::colon, "expected ':' in refs") |
+  if (ParseToken(lltok::colon, "expected ':' in refs") ||
       ParseToken(lltok::lparen, "expected '(' in refs"))
     return true;

@dannypsnl

This comment has been minimized.

@mewmew
Copy link
Member

mewmew commented Oct 13, 2020

Great! Thanks a lot for the diff and release notes @dannypsnl!

Also, for those looking to contribute to the project. Both @dannypsnl and I will be busy for this LLVM release, so if you'd like to contribute to llir/llvm we'd be glad to help you get up to speed with integrating the LLVM 11.0 changes.

Cheers,
Robin

dannypsnl added a commit that referenced this issue Nov 7, 2020
@mewmew
Copy link
Member

mewmew commented Nov 11, 2020

mewmew pushed a commit that referenced this issue Nov 11, 2020
* (#147) DIModule

ref: llir/grammar#7

* update ll

* fix deference nil problem

* [testdata] use llvm 11.0

* use zero value as not present
@dannypsnl
Copy link
Member Author

still think what if we take parser from LLVM, copy enough dependencies to help it work, then use FFI?

@mewmew
Copy link
Member

mewmew commented Feb 11, 2021

still think what if we take parser from LLVM, copy enough dependencies to help it work, then use FFI?

Hi @dannypsnl,

If using cgo, it probably makes more sense to use the official Go binding for LLVM.

To primary motivating case for llir/llvm is to enable access to LLVM IR without the need of cgo and complex build dependencies.

There are obvious benefits and drawbacks to both approaches. The official Go bindings for LLVM will always be up-to-date, and for more complex compilers (e.g. llgo), it makes sense to use these bindings instead of llir/llvm.

The benefit of llir/llvm is both the Go idiomatic data model (e.g. the value.Value interface; see #3 (comment) for background discussion), and the vastly simplified build dependencies. For instance, when the decomp project in the v0.2 release switched from using the official Go bindings for LLVM (which uses cgo) to using llir/llvm the project-wide build time was substantially improved (note, the build time issue has since been mitigated by libraries such as the LLVM bindings of TinyGo which rely on system-installed libraries of LLVM).

From the v0.2 release notes of the decomp project:

Prior to this release, project-wide compilation could take several hours to complete. Now, they complete in less than 1 minute -- the established hard limit for all future releases.

Hope this gives some background on the decision to not use cgo in llir/llvm, and some options of libraries to consider for more complex use cases.

Of course, llir/llvm is here to stay. It may lag behind LLVM releases, but that's fine for the most part. The main parts of the LLVM IR language remain unchanged in between releases of LLVM.

Cheers,
Robin

@dannypsnl
Copy link
Member Author

@mewmew I believe grammar parsing is done now, the rest of the parts are

  1. summary(grammar: add module summary (introduced in LLVM 7.0) #43)
  2. alignment(support data layout?)
  3. vtable(we also didn't support, should we?)

@mewmew
Copy link
Member

mewmew commented Mar 23, 2021

@mewmew I believe grammar parsing is done now, the rest of the parts are

that's incredible. really good job @dannypsnl! thanks for working on this.

@dannypsnl
Copy link
Member Author

dannypsnl commented Mar 24, 2021

Update: once #158 fixed then all done, then we can release for llvm11

@mewmew
Copy link
Member

mewmew commented Mar 25, 2021

Update: once #158 fixed then all done, then we can release for llvm11

#158 is now done. I'll close this issue for now, as the main aspect remaining is module summaries which is already tracked by issue #43. I have yet to come across an LLVM IR file which requires module summaries, or a concrete use cases when developing compilers. I'm sure it exists, just haven't come across it yet personally, so I'm fine with tagging the llir/llvm release for LLVM 11.0 now, and if anyone feels up to it they may work on adding the grammar support for module summary.

Thanks once more for working on getting llir/llvm updated to LLVM 11.0 @dannypsnl.

Cheers,
Robin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants