-
Notifications
You must be signed in to change notification settings - Fork 0
[MLIR][LLVM] Add debug output to the LLVM inliner. #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: move-llvm-inliner
Are you sure you want to change the base?
Conversation
This reverts commit d6ad4f0. Fails to build on at least gcc 12.2: /home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:482:1: error: no declaration matches ‘ContextNode<DerivedCCG, FuncTy, CallTy>* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’ 482 | CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst( | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:393:16: note: candidate is: ‘CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::ContextNode* CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>::getNodeForInst(const CallInfo&)’ 393 | ContextNode *getNodeForInst(const CallInfo &C); | ^~~~~~~~~~~~~~ /home/npopov/repos/llvm-project/llvm/lib/Transforms/IPO/MemProfContextDisambiguation.cpp:99:7: note: ‘class CallsiteContextGraph<DerivedCCG, FuncTy, CallTy>’ defined here 99 | class CallsiteContextGraph { | ^~~~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Upstream I would probably mark it as an NFC commit since it is really only about debug outputs.
I did fix a test, would you still count it as NFC? And how do I mark it? |
Signed-off-by: Jun Zhang <[email protected]>
…fold to AVX512 targets Extends 1bb95a3 to combine on AVX512 targets where the vXi1 type is legal Continues work on addressing Issue llvm#53419
This patch adds some more efficient lowering for vecreduce.min/max under NEON, using sequences of pairwise vpmin/vpmax to reduce to a single value. This nearly resolves issues such as llvm#50466, llvm#40981, llvm#38190. Differential Revision: https://reviews.llvm.org/D146404
The previous test case stored the result of a deinterleaved load and add into the same source address, which resulted in some scatters which we weren't testing for and made the tests harder to understand. Store it at a separate address, which will make the tests easier to read when the cost model is changed after D145085 is landed Reviewed By: reames Differential Revision: https://reviews.llvm.org/D146442
In the 2022-12 release of the A64 ISA it was updated that the assembler must also accept predicate-as-counter register names for the source predicate register and the destination predicate register for: * *MOV: Move predicate (unpredicated)* * *LDR (predicate): Load predicate register* * *STR (predicate): Store predicate register* Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D146311
A follow-on commit will add tests to this file and using the update_llc_test_checks script will make that easier. Differential Revision: https://reviews.llvm.org/D146568
…thOperands Resolves a TODO. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D146599
This has been done using the following commands find libcxx/test -type f -exec perl -pi -e 's|^([^/]+?)((?<!::)ptrdiff_t)|\1std::\2|' \{} \; find libcxx/test -type f -exec perl -pi -e 's|^([^/]+?)((?<!::)max_align_t)|\1std::\2|' \{} \; The std module doesn't export declarations in the global namespaace., This is a preparation for that module. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D146550
Previously epilogues were incorrectly inserted after indirect tail calls because they did not have the `isTerminator` property. Add that property and test that they get correct epilogues. To be safe, also add other properties that were defined for direct tail calls. Differential Revision: https://reviews.llvm.org/D146569
This allows the DWARFDebugLine::SectionParser to try parsing line tables at 4 or 8-byte boundaries if the unaligned offset appears invalid. If aligning the offset does not reduce errors the offset is used unchanged. This is needed for llvm-dwarfdump to be able to extract the line tables (with --debug-lines) from binaries produced by certain compilers that like to align each line table in the .debug_line section. Note that this alignment does not seem to be invalid since the units do point to the correct line table offsets via the DW_AT_stmt_list attribute. Differential Revision: https://reviews.llvm.org/D143513
ConstantInt::getSigned calls ConstantInt::get with the IsSigned flag set to true. That flag normally defaults to false. For always signed constants the code base is not consistent about whether it uses ConstantInt::getSigned or ConstantInt::get with IsSigned set to true. And it's not clear how to decide which way to use. By making getSigned inline, both ways should generate the same code in the end. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D146598
3aee091
to
0649a7f
Compare
Adds a canonicalizer for the concatenate->slice sequence where an output of slice can be replaced with an input of concatenate. This is useful in the context of operations with complex inputs and outputs that are legalized from a framework such as TFL. For example, a TFL graph (FFT->FFT) will be legalized to the following TOSA graph: <complex input> / \ slice slice \ / FFT / \ -+ concatenate | / \ | Redundant slice slice | \ / -+ FFT / \ concatenate | <complex output> Concatenate and slice operations at the boundaries of the graph are useful as they maintain the correct correspondance of input/output tensors to the original TFL graph. However, consecutive complex operations will result in redundant concatenate->slice sequences which should be removed from the final TOSA graph. The canonicalization does not currently handle dynamic types. Signed-off-by: Luke Hutton <[email protected]> Reviewed By: rsuderman Differential Revision: https://reviews.llvm.org/D144545
Similar to what we do for the LMUL>1 register classes. The printing is only working today because the segment registers have "ABI" names set to their base register name.
Remove now redundant fake ABI names from vector registers. This also fixes a crash that occurs if you use fflags as an instruction operand in the assembly and use -debug. It's not a valid register for any instruction since this wouldn't be common. It doesn't have an ABI name so it crashes the register printing in the debug output.
This restores commit d6ad4f0, which was reverted in commit 883dbb9, along with a fix for gcc 12.2 build errors in the original commit. Support for building, printing, and displaying CallsiteContextGraph which represents the MemProf metadata contexts. Uses CRTP to enable support for both IR (regular LTO) and summary (ThinLTO). This patch includes the support for building it in regular LTO mode (from memprof and callsite metadata), and the next patch will add the handling for building it from ThinLTO summaries. Also includes support for dumping the graph to text and to dot files. Follow-on patches will contain the support for cloning on the graph and in the IR. The graph represents the call contexts in all memprof metadata on allocation calls, with nodes for the allocations themselves, as well as for the calls in each context. The graph is initially built from the allocation memprof metadata (or summary) MIBs. It is then updated to match calls with callsite metadata onto the nodes, updating it to reflect any inlining performed on those calls. Each MIB (representing an allocation's call context with allocation behavior) is assigned a unique context id during the graph build. The edges and nodes in the graph are decorated with the context ids they carry. This is used to correctly update the graph when cloning is performed so that we can uniquify the context for a single (possibly cloned) allocation. Differential Revision: https://reviews.llvm.org/D140908
This is necessary to have a complete RISC-V toolchain for Fuchsia. Differential Revision: https://reviews.llvm.org/D146608
…ustom operand instead. The fake register class interferes too much with the autogenerated register class tables. Especially the fake spill size. I'm working on .insn support for compressed instructions and adding AnyRegC broke CodeGen.
This reverts commit e12a950. D142241 broke `-sBUILD_SHARED_LIBS=ON` build. After investigations in llvm#60314, the issue that prompted D142441 now seems gone. Fixes llvm#60314. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D145181
This change prevents rare deadlocks observed for specific macOS/iOS GUI applications which issue many `dlopen()` calls from multiple different threads at startup and where TSan finds and reports a race during startup. Providing a reliable test for this has been deemed infeasible. Although I've only observed this deadlock on Apple platforms, conceptually the cause is not confined to Apple code so the fix lives in platform-independent code. Deadlock scenario: ``` Thread 2 | Thread 4 ReportRace() | Lock internal TSan mutexes | &ctx->slot_mtx | | dlopen() interceptor | OnLibraryLoaded() | MemoryMappingLayout::DumpListOfModules() | calls dyld API, which takes internal lock | lock() interceptor | TSan tries to take internal mutexes again | &ctx->slot_mtx call into symbolizer | MemoryMappingLayout::DumpListOfModules() calls dyld API, which hangs on trying to take lock ``` Resulting in: * Thread 2 has internal TSan mutex, blocked on dyld lock * Thread 4 has dyld lock, blocked on internal TSan mutex The fix prevents this situation by not intercepting any of the calls originating from `MemoryMappingLayout::DumpListOfModules()`. Stack traces for deadlock between ReportRace() and dlopen() interceptor: ``` thread #2, queue = 'com.apple.root.default-qos' frame #0: libsystem_kernel.dylib frame #1: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=<unavailable>, options=<unavailable>) at tsan_interceptors_mac.cpp:306:3 frame #2: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814523c0) block_pointer) at DyldRuntimeState.cpp:227:28 frame #3: dyld`dyld4::APIs::_dyld_get_image_header(this=0x0000000101012a20, imageIndex=614) at DyldAPIs.cpp:240:11 frame #4: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::CurrentImageHeader(this=<unavailable>) at sanitizer_procmaps_mac.cpp:391:35 frame #5: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f2a2800, segment=0x000000016f2a2738) at sanitizer_procmaps_mac.cpp:397:51 frame #6: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f2a2800, modules=0x00000001011000a0) at sanitizer_procmaps_mac.cpp:460:10 frame #7: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x00000001011000a0) at sanitizer_mac.cpp:610:18 frame #8: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(unsigned long) [inlined] __sanitizer::Symbolizer::RefreshModules(this=0x0000000101100078) at sanitizer_symbolizer_libcdep.cpp:185:12 frame #9: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::FindModuleForAddress(this=0x0000000101100078, address=6465454512) at sanitizer_symbolizer_libcdep.cpp:204:5 frame #10: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Symbolizer::SymbolizePC(this=0x0000000101100078, addr=6465454512) at sanitizer_symbolizer_libcdep.cpp:88:15 frame #11: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeCode(addr=6465454512) at tsan_symbolize.cpp:106:35 frame #12: libclang_rt.tsan_osx_dynamic.dylib`__tsan::SymbolizeStack(trace=StackTrace @ 0x0000600002d66d00) at tsan_rtl_report.cpp:112:28 frame #13: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedReportBase::AddMemoryAccess(this=0x000000016f2a2a90, addr=4381057136, external_tag=<unavailable>, s=<unavailable>, tid=<unavailable>, stack=<unavailable>, mset=0x00000001012fc310) at tsan_rtl_report.cpp:190:16 frame #14: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=<unavailable>, old=<unavailable>, typ0=1) at tsan_rtl_report.cpp:795:9 frame #15: libclang_rt.tsan_osx_dynamic.dylib`__tsan::DoReportRace(thr=0x00000001012fc000, shadow_mem=0x000008020a4340e0, cur=Shadow @ x22, old=Shadow @ 0x0000600002d6b4f0, typ=1) at tsan_rtl_access.cpp:166:3 frame #16: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) at tsan_rtl_access.cpp:220:5 frame #17: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(void *) [inlined] __tsan::MemoryAccess(thr=0x00000001012fc000, pc=<unavailable>, addr=<unavailable>, size=8, typ=1) at tsan_rtl_access.cpp:442:3 frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`::__tsan_read8(addr=<unavailable>) at tsan_interface.inc:34:3 <call into TSan from from instrumented code> thread #4, queue = 'com.apple.dock.fullscreen' frame #0: libsystem_kernel.dylib frame #1: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::FutexWait(p=<unavailable>, cmp=<unavailable>) at sanitizer_mac.cpp:540:3 frame #2: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Semaphore::Wait(this=<unavailable>) at sanitizer_mutex.cpp:35:7 frame #3: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::Mutex::Lock(this=0x0000000102992a80) at sanitizer_mutex.h:196:18 frame #4: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:383:10 frame #5: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __sanitizer::GenericScopedLock<__sanitizer::Mutex>::GenericScopedLock(this=<unavailable>, mu=0x0000000102992a80) at sanitizer_mutex.h:382:77 frame #6: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() at tsan_rtl.h:708:10 frame #7: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::TryTraceFunc(thr=0x000000010f084000, pc=0) at tsan_rtl.h:751:7 frame #8: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor() [inlined] __tsan::FuncExit(thr=0x000000010f084000) at tsan_rtl.h:798:7 frame #9: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=0x000000016f3ba280) at tsan_interceptors_posix.cpp:300:5 frame #10: libclang_rt.tsan_osx_dynamic.dylib`__tsan::ScopedInterceptor::~ScopedInterceptor(this=<unavailable>) at tsan_interceptors_posix.cpp:293:41 frame #11: libclang_rt.tsan_osx_dynamic.dylib`::wrap_os_unfair_lock_lock_with_options(lock=0x000000016f21b1e8, options=OS_UNFAIR_LOCK_NONE) at tsan_interceptors_mac.cpp:310:1 frame #12: dyld`dyld4::RuntimeLocks::withLoadersReadLock(this=0x000000016f21b1e0, work=0x00000001814525d4) block_pointer) at DyldRuntimeState.cpp:227:28 frame #13: dyld`dyld4::APIs::_dyld_get_image_vmaddr_slide(this=0x0000000101012a20, imageIndex=412) at DyldAPIs.cpp:273:11 frame #14: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(__sanitizer::MemoryMappedSegment*) at sanitizer_procmaps_mac.cpp:286:17 frame #15: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::Next(this=0x000000016f3ba560, segment=0x000000016f3ba498) at sanitizer_procmaps_mac.cpp:432:15 frame #16: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::MemoryMappingLayout::DumpListOfModules(this=0x000000016f3ba560, modules=0x000000016f3ba618) at sanitizer_procmaps_mac.cpp:460:10 frame #17: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::ListOfModules::init(this=0x000000016f3ba618) at sanitizer_mac.cpp:610:18 frame llvm#18: libclang_rt.tsan_osx_dynamic.dylib`__sanitizer::LibIgnore::OnLibraryLoaded(this=0x0000000101f3aa40, name="<some library>") at sanitizer_libignore.cpp:54:11 frame llvm#19: libclang_rt.tsan_osx_dynamic.dylib`::wrap_dlopen(filename="<some library>", flag=<unavailable>) at sanitizer_common_interceptors.inc:6466:3 <library code> ``` rdar://106766395 Differential Revision: https://reviews.llvm.org/D146593
Ensure that the variant returned by `member->getValue()` has a value and is not `Empty`. Failure to do so will trigger an assertion failure in `llvm::pdb::Variant::getBitWidth()`. This can occur when the `static` member is a forward declaration. Differential Revision: https://reviews.llvm.org/D146536 Reviewed By: sgraenitz
…xpressions The noexcept specifier and explicit specifier can optionally include a boolean expression to make these specifiers apply conditionally, however, clang-format didn't set the context for the parenthesized content of these specifiers, meaning they inherited the parent context, which usually isn't an expressions, leading to misannotated binary operators. This patch applies expression context to the content of these specifiers, making them similar to the static_assert keyword. Fixes llvm#44543 Reviewed By: owenpan, MyDeveloperDay Differential Revision: https://reviews.llvm.org/D146284
Python 3 doesn't have a distinction between PyInt and PyLong, it's all PyLong now. This also fixes a bug in SetNumberFromObject. This used to crash LLDB: ``` lldb -o "script data=lldb.SBData(); data.SetDataFromUInt64Array([2**63])" ``` The problem happened in the PyInt path: ``` if (PyInt_Check(obj)) number = static_cast<T>(PyInt_AsLong(obj)); ``` when obj doesn't fit in a signed long, `PyInt_AsLong` would fail with "OverflowError: Python int too large to convert to C long". The existing long path does the right thing, as it will call `PyLong_AsUnsignedLongLong` for uint64_t. Differential Revision: https://reviews.llvm.org/D146590
It's possible to segfault in `DevirtModule::applyICallBranchFunnel` when attempting to call `getCaller` on a call base that was erased in a prior iteration. This can occur when attempting to find devirtualizable calls via `findDevirtualizableCallsForTypeTest` if the vtable passed to llvm.type.test is a global and not a local. The function works by taking the first argument of the llvm.type.test call (which is a vtable), iterating through all uses of it, and adding any relevant all uses that are calls associated with that intrinsic call to a vector. For most cases where the vtable is actually a *local*, this wouldn't be an issue. Take for example: ``` define i32 @fn(ptr %obj) #0 { %vtable = load ptr, ptr %obj %p = call i1 @llvm.type.test(ptr %vtable, metadata !"typeid2") call void @llvm.assume(i1 %p) %fptr = load ptr, ptr %vtable %result = call i32 %fptr(ptr %obj, i32 1) ret i32 %result } ``` `findDevirtualizableCallsForTypeTest` will check the call base ` %result = call i32 %fptr(ptr %obj, i32 1)`, find that it is associated with a virtualizable call from `%vtable`, find all loads for `%vtable`, and add any instances those load results are called into a vector. Now consider the case where instead `%vtable` was the global itself rather than a local: ``` define i32 @fn(ptr %obj) #0 { %p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2") call void @llvm.assume(i1 %p) %fptr = load ptr, ptr @vtable %result = call i32 %fptr(ptr %obj, i32 1) ret i32 %result } ``` `findDevirtualizableCallsForTypeTest` should work normally and add one unique call instance to a vector. However, if there are multiple instances where this same global is used for llvm.type.test, like with: ``` define i32 @fn(ptr %obj) #0 { %p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2") call void @llvm.assume(i1 %p) %fptr = load ptr, ptr @vtable %result = call i32 %fptr(ptr %obj, i32 1) ret i32 %result } define i32 @fn2(ptr %obj) #0 { %p = call i1 @llvm.type.test(ptr @vtable, metadata !"typeid2") call void @llvm.assume(i1 %p) %fptr = load ptr, ptr @vtable %result = call i32 %fptr(ptr %obj, i32 1) ret i32 %result } ``` Then each call base `%result = call i32 %fptr(ptr %obj, i32 1)` will be added to the vector twice. This is because for either call base `%result = call i32 %fptr(ptr %obj, i32 1) `, we determine it is associated with a virtualizable call from `@vtable`, and then we iterate through all the uses of `@vtable`, which is used across multiple functions. So when scanning the first `%result = call i32 %fptr(ptr %obj, i32 1)`, then both call bases will be added to the vector, but when scanning the second one, both call bases are added again, resulting in duplicate call bases in the CSInfo.CallSites vector. Note this is actually accounted for in every other instance WPD iterates over CallSites. What everything else does is actually add the call base to the `OptimizedCalls` set and just check if it's already in the set. We can't reuse that particular set since it serves a different purpose marking which calls where devirtualized which `applyICallBranchFunnel` explicitly says it doesn't. For this fix, we can just account for duplicates with a map and do the actual replacements afterwards by iterating over the map. Differential Revision: https://reviews.llvm.org/D146267
This reverts commit e036139.
Sometimes the clang driver will receive a target triple where the deployment version is too low to support the platform + arch. In those cases, the compiler upgrades the final minOS which is what gets recorded ultimately by the linker in LC_BUILD_VERSION. TextAPI should also reuse this logic for capturing minOS in recorded TBDv5 files. Reviewed By: ributzka Differential Revision: https://reviews.llvm.org/D145690
…type position degrade to id Fixes llvm#61481 Reviewed By: dang Differential Revision: https://reviews.llvm.org/D146671
Misc. cleanups for `WebAssemblyDebugValueManager`. - Use `Register` for registers - Simpler for loop iteration - Rename a variable - Reorder methods - Reduce `SmallVector` size for `DBG_VALUE`s to 1; one def usually have a single `DBG_VALUE` attached to it in most cases - Add a few more lines of comments Reviewed By: dschuff Differential Revision: https://reviews.llvm.org/D146743
They have been obsoleted for a long time and D146565 recently removed Clang support.
Plumbing from the language level to the assume intrinsics with separate_storage operand bundles. Patch by David Goldblatt (davidtgoldblatt) Differential Revision: https://reviews.llvm.org/D136515
…er semantics Following up on the comments in https://reviews.llvm.org/D144108 this patch refactors the im2col conversion patterns for `linalg.conv_2d_nhwc_hwcf` and `linalg.conv_2d_nchw_fchw` convolutions to use gather semantics for the im2col packing `linalg.generic`. Follow up work can include a similar pattern for depthwise convolutions and a generalization of the patterns here to work with any `LinalgOp` as well. Differential Revision: https://reviews.llvm.org/D144678
…tRes_ADDSUB On targets without ADDCARRY or ADDE, we need to emit a separate SETCC to determine carry from the low half to the high half. The high half is calculated by a series of ADDs. When RHSLo and RHSHi are -1, without this patch, we get: Hi = (add (add LHSHi,(setult Lo, LHSLo), -1) Where as with the patch we get: Hi = (sub LHSHi, (seteq LHSLo, 0)) Only RHSLo is -1 we can instead do (setne Lo, 0). Similar to gcc: https://godbolt.org/z/M83f6rz39 Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D146635
By making the 64 bit integer literals unsigned. Otherwise some of them are unexpectedly sign extended (and the compiler rightly diagnosed this with warnings) Initially added in D80506. Reviewed By: MaskRay Differential Revision: https://reviews.llvm.org/D146667
Differential Revision: https://reviews.llvm.org/D146769
Without this patch, std::bit_ceil<uint32_t> is compiled as: %dec = add i32 %x, -1 %lz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false) %sub = sub i32 32, %lz %res = shl i32 1, %sub %ugt = icmp ugt i32 %x, 1 %sel = select i1 %ugt, i32 %res, i32 1 With this patch, we generate: %dec = add i32 %x, -1 %ctlz = tail call i32 @llvm.ctlz.i32(i32 %dec, i1 false) %sub = sub nsw i32 0, %ctlz %and = and i32 %1, 31 %sel = shl nuw i32 1, %and ret i32 %sel https://alive2.llvm.org/ce/z/pwezvF This patch recognizes the specific pattern from std::bit_ceil in libc++ and libstdc++ and drops the conditional move. In addition to the LLVM IR generated for std::bit_ceil(X), this patch recognizes variants like: std::bit_ceil(X - 1) std::bit_ceil(X + 1) std::bit_ceil(X + 2) std::bit_ceil(-X) std::bit_ceil(~X) This patch fixes: llvm#60802 Differential Revision: https://reviews.llvm.org/D145299
In this file, most of the line don't have trailing spaces, but some of them have. To keep consistent, remove the trailing spaces. Reviewed By: skan Differential Revision: https://reviews.llvm.org/D146697
Differential Revision: https://reviews.llvm.org/D146636 Signed-off-by: Jun Zhang <[email protected]>
Alive2: https://alive2.llvm.org/ce/z/dxxD7B Fixes: llvm#60690 Signed-off-by: Jun Zhang <[email protected]> Differential Revision: https://reviews.llvm.org/D146637
Keep `EnableLoopDataPrefetch` option off for now because we need a few more TTIs and ISels. This patch is inspired by http://reviews.llvm.org/D17943. Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D146600
This patch adds tests for umax(x, 1u). This patch fixes: llvm#60233 It turns out that commit 86b4d86 on Feb 8, 2023 already performs the instcombine transformation proposed in the issue, so the issue requires no change on the codegen side.
Reviewed By: Luo yuanke Differential Revision: https://reviews.llvm.org/D146683
This patch precommits a test for: llvm#61365
The new algorithm is: 1. Find all multilibs with flags that are a subset of the requested flags. 2. If more than one multilib matches, choose the last. In addition a new selection mechanism is permitted via an overload of MultilibSet::select() for which multiple multilibs are returned. This allows layering multilibs on top of each other. Since multilibs are now ordered within a list, they no longer need a Priority field. The new algorithm is different to the old algorithm, but in practise the old algorithm was always used in such a way that the effect is the same. The old algorithm was to find the set intersection of the requested flags (with the first character of each removed) with each multilib's flags (ditto), and for that intersection check whether the first character matched. However, ignoring the first characters, the requested flags were always a superset of all the multilibs flags. Therefore the new algorithm can be used as a drop-in replacement. The exception is Fuchsia, which needs adjusting slightly to set both fexceptions and fno-exceptions flags. Differential Revision: https://reviews.llvm.org/D142905
…id-underscore-in-googletest-name According to the Google docs, the convention is TEST(TestSuiteName, TestName). Apply that convention to the source code, test and documentation of the check. Differential Revision: https://reviews.llvm.org/D146713
The revision switches the remaining LLVM dialect tests to use opaque pointers. Selected tests are copied to a postfixed test file for the time being. A number of tests disappear once we fully switch to opaque pointers. In particular, all tests that check verify a pointer element type matches another type as well as tests of recursive types. Part of https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179 Reviewed By: Dinistro, zero9178 Differential Revision: https://reviews.llvm.org/D146726
This revealed a test case that wasn't hitting the intended branch because the inlinees had no function definition. Depends on D146628 Differential Revision: https://reviews.llvm.org/D146633
f4600c2
to
3486f5f
Compare
This revealed a test case that wasn't hitting the intended branch because the inlinees had no function definition.