Skip to content

[MLIR][LLVM] Add debug output to the LLVM inliner. #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 208 commits into
base: move-llvm-inliner
Choose a base branch
from

Conversation

definelicht
Copy link
Collaborator

This revealed a test case that wasn't hitting the intended branch because the inlinees had no function definition.

nikic and others added 2 commits March 22, 2023 15:43
This reverts commit d6ad4f0.

Fails to build on at least gcc 12.2:

/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:482:1: error: no declaration matches ‘ContextNode<DerivedCCG, FuncTy, CallTy>* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’
  482 | CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:393:16: note: candidate is: ‘CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’
  393 |   ContextNode *getNodeForInst(const CallInfo &C);
      |                ^~~~~~~~~~~~~~
/home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:99:7: note: ‘class CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>’ defined here
   99 | class CallsiteContextGraph {
      |       ^~~~~~~~~~~~~~~~~~~~
@definelicht definelicht requested a review from gysit March 22, 2023 15:05
Copy link
Collaborator

@gysit gysit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Upstream I would probably mark it as an NFC commit since it is really only about debug outputs.

@definelicht
Copy link
Collaborator Author

LGTM! Upstream I would probably mark it as an NFC commit since it is really only about debug outputs.

I did fix a test, would you still count it as NFC? And how do I mark it? [MLIR][LLVM][NFC]? [NFC][MLIR][LLVM]? [MLIR][LLVM] NFC: ?

junaire and others added 11 commits March 22, 2023 23:42
…fold to AVX512 targets

Extends 1bb95a3 to combine on AVX512 targets where the vXi1 type is legal

Continues work on addressing Issue llvm#53419
This patch adds some more efficient lowering for vecreduce.min/max under NEON,
using sequences of pairwise vpmin/vpmax to reduce to a single value.

This nearly resolves issues such as llvm#50466, llvm#40981, llvm#38190.

Differential Revision: https://reviews.llvm.org/D146404
The previous test case stored the result of a deinterleaved load and add
into the same source address, which resulted in some scatters which we
weren't testing for and made the tests harder to understand.
Store it at a separate address, which will make the tests easier to read
when the cost model is changed after D145085 is landed

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D146442
In the 2022-12 release of the A64 ISA it was updated that the assembler must
also accept predicate-as-counter register names for the source predicate
register and the destination predicate register for:
 * *MOV: Move predicate (unpredicated)*
 * *LDR (predicate): Load predicate register*
 * *STR (predicate): Store predicate register*

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D146311
A follow-on commit will add tests to this file and using the
update_llc_test_checks script will make that easier.

Differential Revision: https://reviews.llvm.org/D146568
…thOperands

Resolves a TODO.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D146599
This has been done using the following commands
  find libcxx/test -type f -exec perl -pi -e 's|^([^/]+?)((?<!::)ptrdiff_t)|\1std::\2|' \{} \;
  find libcxx/test -type f -exec perl -pi -e 's|^([^/]+?)((?<!::)max_align_t)|\1std::\2|' \{} \;

The std module doesn't export declarations in the global namespaace.,
This is a preparation for that module.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D146550
Previously epilogues were incorrectly inserted after indirect tail calls because
they did not have the `isTerminator` property. Add that property and test that
they get correct epilogues. To be safe, also add other properties that were
defined for direct tail calls.

Differential Revision: https://reviews.llvm.org/D146569
This allows the DWARFDebugLine::SectionParser to try parsing line tables
at 4 or 8-byte boundaries if the unaligned offset appears invalid. If
aligning the offset does not reduce errors the offset is used unchanged.

This is needed for llvm-dwarfdump to be able to extract the line tables
(with --debug-lines) from binaries produced by certain compilers that
like to align each line table in the .debug_line section. Note that this
alignment does not seem to be invalid since the units do point to the
correct line table offsets via the DW_AT_stmt_list attribute.

Differential Revision: https://reviews.llvm.org/D143513
ConstantInt::getSigned calls ConstantInt::get with the IsSigned flag set to true.
That flag normally defaults to false.

For always signed constants the code base is not consistent about whether
it uses ConstantInt::getSigned or ConstantInt::get with IsSigned set to true.
And it's not clear how to decide which way to use.

By making getSigned inline, both ways should generate the same code in
the end.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D146598
lhutton1 and others added 12 commits March 22, 2023 16:52
Adds a canonicalizer for the concatenate->slice sequence where
an output of slice can be replaced with an input of concatenate.

This is useful in the context of operations with complex inputs
and outputs that are legalized from a framework such as TFL.
For example, a TFL graph (FFT->FFT) will be legalized to the
following TOSA graph:

     <complex input>
         /     \
     slice    slice
         \     /
           FFT
          /   \     -+
       concatenate   |
         /     \     |  Redundant
     slice    slice  |
         \     /    -+
           FFT
         /     \
       concatenate
            |
     <complex output>

Concatenate and slice operations at the boundaries of the graph are
useful as they maintain the correct correspondance of input/output
tensors to the original TFL graph. However, consecutive
complex operations will result in redundant concatenate->slice
sequences which should be removed from the final TOSA graph.

The canonicalization does not currently handle dynamic types.

Signed-off-by: Luke Hutton <[email protected]>

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D144545
Similar to what we do for the LMUL>1 register classes. The printing
is only working today because the segment registers have "ABI" names
set to their base register name.
Remove now redundant fake ABI names from vector registers.

This also fixes a crash that occurs if you use fflags as an instruction
operand in the assembly and use -debug. It's not a valid register
for any instruction since this wouldn't be common. It doesn't have
an ABI name so it crashes the register printing in the debug output.
This restores commit d6ad4f0, which was
reverted in commit 883dbb9, along with
a fix for gcc 12.2 build errors in the original commit.

Support for building, printing, and displaying CallsiteContextGraph
which represents the MemProf metadata contexts. Uses CRTP to enable
support for both IR (regular LTO) and summary (ThinLTO). This patch
includes the support for building it in regular LTO mode (from
memprof and callsite metadata), and the next patch will add the
handling for building it from ThinLTO summaries.

Also includes support for dumping the graph to text and to dot files.

Follow-on patches will contain the support for cloning on the graph and
in the IR.

The graph represents the call contexts in all memprof metadata on
allocation calls, with nodes for the allocations themselves, as well as
for the calls in each context. The graph is initially built from the
allocation memprof metadata (or summary) MIBs. It is then updated to
match calls with callsite metadata onto the nodes, updating it to
reflect any inlining performed on those calls.

Each MIB (representing an allocation's call context with allocation
behavior) is assigned a unique context id during the graph build. The
edges and nodes in the graph are decorated with the context ids they
carry. This is used to correctly update the graph when cloning is
performed so that we can uniquify the context for a single (possibly
cloned) allocation.

Differential Revision: https://reviews.llvm.org/D140908
This is necessary to have a complete RISC-V toolchain for Fuchsia.

Differential Revision: https://reviews.llvm.org/D146608
…ustom operand instead.

The fake register class interferes too much with the autogenerated
register class tables. Especially the fake spill size.

I'm working on .insn support for compressed instructions and adding
AnyRegC broke CodeGen.
This reverts commit e12a950.

D142241 broke `-sBUILD_SHARED_LIBS=ON` build. After investigations in
llvm#60314, the issue that
prompted D142441 now seems gone.

Fixes llvm#60314.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D145181
This change prevents rare deadlocks observed for specific macOS/iOS GUI
applications which issue many `dlopen()` calls from multiple different
threads at startup and where TSan finds and reports a race during
startup.  Providing a reliable test for this has been deemed infeasible.

Although I've only observed this deadlock on Apple platforms,
conceptually the cause is not confined to Apple code so the fix lives in
platform-independent code.

Deadlock scenario:
```
Thread 2                    | Thread 4
ReportRace()                |
Lock internal TSan mutexes  |
  &ctx->slot_mtx            |
                            | dlopen() interceptor
                            | OnLibraryLoaded()
                            | MemoryMappingLayout::DumpListOfModules()
                            | calls dyld API, which takes internal lock
                            | lock() interceptor
                            | TSan tries to take internal mutexes again
                            |   &ctx->slot_mtx
call into symbolizer        |
MemoryMappingLayout::DumpListOfModules()
calls dyld API, which hangs on trying to take lock
```
Resulting in:
* Thread 2 has internal TSan mutex, blocked on dyld lock
* Thread 4 has dyld lock, blocked on internal TSan mutex

The fix prevents this situation by not intercepting any of the calls
originating from `MemoryMappingLayout::DumpListOfModules()`.

Stack traces for deadlock between ReportRace() and dlopen() interceptor:
```
thread #2, queue = 'com.apple.root.default-qos'
  frame #0: libsystem_kernel.dylib
  frame #1: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=<unavailable>, options=<unavailable>) at tsan_interceptors_mac.cpp:306:3
  frame #2: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814523c0) block_pointer) at DyldRuntimeState.cpp:227:28
  frame #3: dyld`dyld4::APIs::_dyld_get_image_header(this=0x0000000101012a20, imageIndex=614) at DyldAPIs.cpp:240:11
  frame #4: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::CurrentImageHeader(this=<unavailable>) at sanitizer_procmaps_mac.cpp:391:35
  frame #5: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f2a2800, segment=0x000000016f2a2738) at sanitizer_procmaps_mac.cpp:397:51
  frame #6: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f2a2800, modules=0x00000001011000a0) at sanitizer_procmaps_mac.cpp:460:10
  frame #7: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x00000001011000a0) at sanitizer_mac.cpp:610:18
  frame #8: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(unsigned long) [inlined] __sanitizer::Symbolizer::RefreshModules(this=0x0000000101100078) at sanitizer_symbolizer_libcdep.cpp:185:12
  frame #9: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(this=0x0000000101100078, address=6465454512) at sanitizer_symbolizer_libcdep.cpp:204:5
  frame #10: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::SymbolizePC(this=0x0000000101100078, addr=6465454512) at sanitizer_symbolizer_libcdep.cpp:88:15
  frame #11: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeCode(addr=6465454512) at tsan_symbolize.cpp:106:35
  frame #12: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeStack(trace=StackTrace @ 0x0000600002d66d00) at tsan_rtl_report.cpp:112:28
  frame #13: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedReportBase::AddMemoryAccess(this=0x000000016f2a2a90, addr=4381057136, external_tag=<unavailable>, s=<unavailable>, tid=<unavailable>, stack=<unavailable>, mset=0x00000001012fc310) at tsan_rtl_report.cpp:190:16
  frame #14: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=<unavailable>, old=<unavailable>, typ0=1) at tsan_rtl_report.cpp:795:9
  frame #15: libclang_rt.tsan_osx_dynamic.dylib`__tsan::DoReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=Shadow @ x22, old=Shadow @ 0x0000600002d6b4f0, typ=1) at tsan_rtl_access.cpp:166:3
  frame #16: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) at tsan_rtl_access.cpp:220:5
  frame #17: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) [inlined] __tsan::MemoryAccess(thr=0x00000001012fc000, pc=<unavailable>, addr=<unavailable>, size=8, typ=1) at tsan_rtl_access.cpp:442:3
  frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(addr=<unavailable>) at tsan_interface.inc:34:3
  <call into TSan from from instrumented code>

thread #4, queue = 'com.apple.dock.fullscreen'
  frame #0:  libsystem_kernel.dylib
  frame #1:  libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::FutexWait(p=<unavailable>, cmp=<unavailable>) at sanitizer_mac.cpp:540:3
  frame #2:  libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Semaphore::Wait(this=<unavailable>) at sanitizer_mutex.cpp:35:7
  frame #3:  libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Mutex::Lock(this=0x0000000102992a80) at sanitizer_mutex.h:196:18
  frame #4:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:383:10
  frame #5:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:382:77
  frame #6:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() at tsan_rtl.h:708:10
  frame #7:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::TryTraceFunc(thr=0x000000010f084000, pc=0) at tsan_rtl.h:751:7
  frame #8:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::FuncExit(thr=0x000000010f084000) at tsan_rtl.h:798:7
  frame #9:  libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=0x000000016f3ba280) at tsan_interceptors_posix.cpp:300:5
  frame #10: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=<unavailable>) at tsan_interceptors_posix.cpp:293:41
  frame #11: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=0x000000016f21b1e8, options=OS_UNFAIR_LOCK_NONE) at tsan_interceptors_mac.cpp:310:1
  frame #12: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814525d4) block_pointer) at DyldRuntimeState.cpp:227:28
  frame #13: dyld`dyld4::APIs::_dyld_get_image_vmaddr_slide(this=0x0000000101012a20, imageIndex=412) at DyldAPIs.cpp:273:11
  frame #14: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(__sanitizer::MemoryMappedSegment*) at sanitizer_procmaps_mac.cpp:286:17
  frame #15: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f3ba560, segment=0x000000016f3ba498) at sanitizer_procmaps_mac.cpp:432:15
  frame #16: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f3ba560, modules=0x000000016f3ba618) at sanitizer_procmaps_mac.cpp:460:10
  frame #17: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x000000016f3ba618) at sanitizer_mac.cpp:610:18
  frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::LibIgnore::OnLibraryLoaded(this=0x0000000101f3aa40, name="<some library>") at sanitizer_libignore.cpp:54:11
  frame llvm#19: libclang_rt.tsan_osx_dynamic.dylib`::wrap_dlopen(filename="<some library>", flag=<unavailable>) at sanitizer_common_interceptors.inc:6466:3
  <library code>
```

rdar://106766395

Differential Revision: https://reviews.llvm.org/D146593
Ensure that the variant returned by `member->getValue()` has a value and
is not `Empty`.  Failure to do so will trigger an assertion failure in
`llvm::pdb::Variant::getBitWidth()`.  This can occur when the `static`
member is a forward declaration.

Differential Revision: https://reviews.llvm.org/D146536
Reviewed By: sgraenitz
…xpressions

The noexcept specifier and explicit specifier can optionally include a
boolean expression to make these specifiers apply conditionally,
however, clang-format didn't set the context for the parenthesized
content of these specifiers, meaning they inherited the parent context,
which usually isn't an expressions, leading to misannotated binary
operators.

This patch applies expression context to the content of these
specifiers, making them similar to the static_assert keyword.

Fixes llvm#44543

Reviewed By: owenpan, MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D146284
Python 3 doesn't have a distinction between PyInt and PyLong, it's all
PyLong now.

This also fixes a bug in SetNumberFromObject. This used to crash LLDB:
```
lldb -o "script data=lldb.SBData(); data.SetDataFromUInt64Array([2**63])"
```

The problem happened in the PyInt path:
```
  if (PyInt_Check(obj))
      number = static_cast<T>(PyInt_AsLong(obj));
```
when obj doesn't fit in a signed long, `PyInt_AsLong` would fail with
"OverflowError: Python int too large to convert to C long".

The existing long path does the right thing, as it will call
`PyLong_AsUnsignedLongLong` for uint64_t.

Differential Revision: https://reviews.llvm.org/D146590
PiJoules and others added 28 commits March 23, 2023 21:44
It's possible to segfault in `DevirtModule::applyICallBranchFunnel` when
attempting to call `getCaller` on a call base that was erased in a prior
iteration. This can occur when attempting to find devirtualizable calls
via `findDevirtualizableCallsForTypeTest` if the vtable passed to
llvm.type.test is a global and not a local. The function works by taking
the first argument of the llvm.type.test call (which is a vtable),
iterating through all uses of it, and adding any relevant all uses that
are calls associated with that intrinsic call to a vector. For most
cases where the vtable is actually a *local*, this wouldn't be an issue.
Take for example:

```
define i32 @fn(ptr %obj) #0 {
  %vtable = load ptr, ptr %obj
  %p = call i1 @llvm.type.test(ptr %vtable, metadata !"typeid2")
  call void @llvm.assume(i1 %p)
  %fptr = load ptr, ptr %vtable
  %result = call i32 %fptr(ptr %obj, i32 1)
  ret i32 %result
}
```

`findDevirtualizableCallsForTypeTest` will check the call base ` %result
= call i32 %fptr(ptr %obj, i32 1)`, find that it is associated with a
virtualizable call from `%vtable`, find all loads for `%vtable`, and add
any instances those load results are called into a vector. Now consider
the case where instead `%vtable` was the global itself rather than a
local:

```
define i32 @fn(ptr %obj) #0 {
  %p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
  call void @llvm.assume(i1 %p)
  %fptr = load ptr, ptr @vtable
  %result = call i32 %fptr(ptr %obj, i32 1)
  ret i32 %result
}
```

`findDevirtualizableCallsForTypeTest` should work normally and add one
unique call instance to a vector. However, if there are multiple
instances where this same global is used for llvm.type.test, like with:

```
define i32 @fn(ptr %obj) #0 {
  %p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
  call void @llvm.assume(i1 %p)
  %fptr = load ptr, ptr @vtable
  %result = call i32 %fptr(ptr %obj, i32 1)
  ret i32 %result
}

define i32 @fn2(ptr %obj) #0 {
  %p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2")
  call void @llvm.assume(i1 %p)
  %fptr = load ptr, ptr @vtable
  %result = call i32 %fptr(ptr %obj, i32 1)
  ret i32 %result
}
```

Then each call base `%result = call i32 %fptr(ptr %obj, i32 1)` will be
added to the vector twice. This is because for either call base `%result
= call i32 %fptr(ptr %obj, i32 1) `, we determine it is associated with
a virtualizable call from `@vtable`, and then we iterate through all the
uses of `@vtable`, which is used across multiple functions. So when
scanning the first `%result = call i32 %fptr(ptr %obj, i32 1)`, then
both call bases will be added to the vector, but when scanning the
second one, both call bases are added again, resulting in duplicate call
bases in the CSInfo.CallSites vector.

Note this is actually accounted for in every other instance WPD iterates
over CallSites. What everything else does is actually add the call base
to the `OptimizedCalls` set and just check if it's already in the set.
We can't reuse that particular set since it serves a different purpose
marking which calls where devirtualized which `applyICallBranchFunnel`
explicitly says it doesn't. For this fix, we can just account for
duplicates with a map and do the actual replacements afterwards by
iterating over the map.

Differential Revision: https://reviews.llvm.org/D146267
Sometimes the clang driver will receive a target triple where the
deployment version is too low to support the platform + arch. In those
cases, the compiler upgrades the final minOS which is what gets recorded
ultimately by the linker in LC_BUILD_VERSION. TextAPI should also reuse
this logic for capturing minOS in recorded TBDv5 files.

Reviewed By: ributzka

Differential Revision: https://reviews.llvm.org/D145690
…type position degrade to id

Fixes llvm#61481

Reviewed By: dang

Differential Revision: https://reviews.llvm.org/D146671
Misc. cleanups for `WebAssemblyDebugValueManager`.
- Use `Register` for registers
- Simpler for loop iteration
- Rename a variable
- Reorder methods
- Reduce `SmallVector` size for `DBG_VALUE`s to 1; one def usually have
  a single `DBG_VALUE` attached to it in most cases
- Add a few more lines of comments

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D146743
They have been obsoleted for a long time and D146565 recently removed
Clang support.
Plumbing from the language level to the assume intrinsics with
separate_storage operand bundles.

Patch by David Goldblatt (davidtgoldblatt)

Differential Revision: https://reviews.llvm.org/D136515
…er semantics

Following up on the comments in https://reviews.llvm.org/D144108 this
patch refactors the im2col conversion patterns for `linalg.conv_2d_nhwc_hwcf`
and `linalg.conv_2d_nchw_fchw` convolutions to use gather semantics for the im2col
packing `linalg.generic`.

Follow up work can include a similar pattern for depthwise convolutions
and a generalization of the patterns here to work with any `LinalgOp` as
well.

Differential Revision: https://reviews.llvm.org/D144678
…tRes_ADDSUB

 On targets without ADDCARRY or ADDE, we need to emit a separate
 SETCC to determine carry from the low half to the high half.
 The high half is calculated by a series of ADDs.

 When RHSLo and RHSHi are -1, without this patch, we get:
   Hi = (add (add LHSHi,(setult Lo, LHSLo), -1)
 Where as with the patch we get:
   Hi = (sub LHSHi, (seteq LHSLo, 0))

 Only RHSLo is -1 we can instead do (setne Lo, 0).

 Similar to gcc: https://godbolt.org/z/M83f6rz39

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D146635
By making the 64 bit integer literals unsigned. Otherwise some of them
are unexpectedly sign extended (and the compiler rightly diagnosed this
with warnings)

Initially added in D80506.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D146667
Without this patch, std::bit_ceil<uint32_t> is compiled as:

  %dec = add i32 %x, -1
  %lz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false)
  %sub = sub i32 32, %lz
  %res = shl i32 1, %sub
  %ugt = icmp ugt i32 %x, 1
  %sel = select i1 %ugt, i32 %res, i32 1

With this patch, we generate:

  %dec = add i32 %x, -1
  %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false)
  %sub = sub nsw i32 0, %ctlz
  %and = and i32 %1, 31
  %sel = shl nuw i32 1, %and
  ret i32 %sel

https://alive2.llvm.org/ce/z/pwezvF

This patch recognizes the specific pattern from std::bit_ceil in
libc++ and libstdc++ and drops the conditional move.  In addition to
the LLVM IR generated for std::bit_ceil(X), this patch recognizes
variants like:

  std::bit_ceil(X - 1)
  std::bit_ceil(X + 1)
  std::bit_ceil(X + 2)
  std::bit_ceil(-X)
  std::bit_ceil(~X)

This patch fixes:

llvm#60802

Differential Revision: https://reviews.llvm.org/D145299
In this file, most of the line don't have trailing spaces,
but some of them have. To keep consistent, remove the trailing
spaces.

Reviewed By: skan

Differential Revision: https://reviews.llvm.org/D146697
Differential Revision: https://reviews.llvm.org/D146636

Signed-off-by: Jun Zhang <[email protected]>
Keep `EnableLoopDataPrefetch` option off for now because
we need a few more TTIs and ISels.

This patch is inspired by http://reviews.llvm.org/D17943.

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D146600
This patch adds tests for umax(x, 1u).

This patch fixes:

llvm#60233

It turns out that commit 86b4d86 on
Feb 8, 2023 already performs the instcombine transformation proposed
in the issue, so the issue requires no change on the codegen side.
Reviewed By: Luo yuanke

Differential Revision: https://reviews.llvm.org/D146683
This patch precommits a test for:

llvm#61365
The new algorithm is:
1. Find all multilibs with flags that are a subset of the requested
   flags.
2. If more than one multilib matches, choose the last.

In addition a new selection mechanism is permitted via an overload of
MultilibSet::select() for which multiple multilibs are returned.
This allows layering multilibs on top of each other.

Since multilibs are now ordered within a list, they no longer need a
Priority field.

The new algorithm is different to the old algorithm, but in practise
the old algorithm was always used in such a way that the effect is the
same.
The old algorithm was to find the set intersection of the requested
flags (with the first character of each removed) with each multilib's
flags (ditto), and for that intersection check whether the first
character matched. However, ignoring the first characters, the
requested flags were always a superset of all the multilibs flags.
Therefore the new algorithm can be used as a drop-in replacement.

The exception is Fuchsia, which needs adjusting slightly to set both
fexceptions and fno-exceptions flags.

Differential Revision: https://reviews.llvm.org/D142905
…id-underscore-in-googletest-name

According to the Google docs, the convention is
TEST(TestSuiteName, TestName). Apply that convention to the
source code, test and documentation of the check.

Differential Revision: https://reviews.llvm.org/D146713
The revision switches the remaining LLVM dialect tests to use opaque
pointers. Selected tests are copied to a postfixed test file for the
time being.

A number of tests disappear once we fully switch to opaque pointers.
In particular, all tests that check verify a pointer element type
matches another type as well as tests of recursive types.

Part of https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179

Reviewed By: Dinistro, zero9178

Differential Revision: https://reviews.llvm.org/D146726
This revealed a test case that wasn't hitting the intended branch
because the inlinees had no function definition.

Depends on D146628

Differential Revision: https://reviews.llvm.org/D146633
definelicht pushed a commit that referenced this pull request Dec 13, 2023
When all the large const offsets masked with the same value from bit-12 to bit-23.
Fold
  add     x8, x0, llvm#2031, lsl #12
  add     x8, x8, llvm#960
  ldr     x9, [x8, x8]
  ldr     x8, [x8, llvm#2056]

into
  add     x8, x0, llvm#2031, lsl #12
  ldr     x9, [x8, llvm#960]
  ldr     x8, [x8, llvm#3016]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.