-
Notifications
You must be signed in to change notification settings - Fork 0
Distinct Attributes. #16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped a bunch of comments
@@ -149,12 +149,14 @@ class AbstractAttribute { | |||
|
|||
namespace detail { | |||
class AttributeUniquer; | |||
class DistinctAttrUniquer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a somewhat schizophrenic name, but it really is a replacement for the AttributeUniquer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah better suggestions are welcome!
Note that "logically" distinct attributes are uniqued. We just do not store the guid.
|
||
/// Allocates a value type instance for the current thread. | ||
template <typename... Args> | ||
ValueT *create(Args &&...args) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive templating :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well here I was considering if the ValueT should be a create function template or a class template?
I went for putting it on the class since that is a better fit for the use case. I want people to only create DistinctAttrs with the distinctAttrStore.
parseToken(Token::less, "expected '<' after distinct id")) | ||
return {}; | ||
Attribute referencedAttr = parseAttribute(type); | ||
if (!referencedAttr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if we always want to re-parse the referenced element. For now it should be fine, though. Changing it seems simple enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it is a good idea to postpone that discussion to when the revision is actually public. I want to start with the simple solution :).
b7e3851
to
5fc5516
Compare
When the ENTRY statement is used, the same source can return different types depending on the entry point. These different return values are storage associated (share the same storage). Previously, this led to the declaration of the results to all have the largest type. This patch adds a convert between the stack allocation and the declaration so that the hlfir.decl gets the right type. I haven't managed to generate code where this convert converted a reference to an allocation for a smaller type into an allocation for a larger one, but I have added an assert just in case. This is a different solution to https://reviews.llvm.org/D152725, see discussion there. Differential Revision: https://reviews.llvm.org/D152931
Codegen only supports conversions between logicals and integers. The verifier should reflect this. Differential Revision: https://reviews.llvm.org/D152935
Adds a new HLFIR operation for the COUNT intrinsic according to the design set out in flang/docs/HighLevel.md. This patch includes all the necessary changes to create a new HLFIR operation and lower it into the fir runtime call. Author was @jacob-crawley. Minor adjustments by @tblah Differential Revision: https://reviews.llvm.org/D152521
Fold uadd.sat(X, Y) uge X and usub.sat(X, Y) ule X to true. Proof: https://alive2.llvm.org/ce/z/596m9X Fixes llvm#63381.
Allows constant folding of such instructions when estimating user bonus. Differential Revision: https://reviews.llvm.org/D153036
…c speed llvm#62750 I setup a simple test with a large .so (~100MiB) that was only present on the target machine but not present on the local machine, and ran a lldb server on the target and connectd to it. LLDB properly downloads the file from the remote, but it does so at a very slow speed, even over a hardwired 1Gbps connection! Increasing the buffer size for downloading these helps quite a bit. Test setup: ``` $ cat gen.py print('const char* hugeglobal = ') for _ in range(1000*500): print(' "' + '1234'*50 + '"') print(';') print('const char* mystring() { return hugeglobal; }') $ gen.py > huge.c $ mkdir libdir $ gcc -fPIC huge.c -Wl,-soname,libhuge.so -o libdir/libhuge.so -shared $ cat test.c #include <string.h> #include <stdio.h> extern const char* mystring(); int main() { printf("%d\n", strlen(mystring())); } $ gcc test.c -L libdir -l huge -Wl,-rpath='$ORIGIN' -o test $ rsync -a libdir remote:~/ $ ssh remote bash -c "cd ~/libdir && /llvm/buildr/bin/lldb-server platform --server --listen '*:1234'" ``` in another terminal ``` $ rm -rf ~/.lldb # clear cache $ cat connect.lldb platform select remote-linux platform connect connect://10.0.0.14:1234 file test b main r image list c q $ time /llvm/buildr/bin/lldb --source connect.lldb ``` Times with various buffer sizes: 1kiB (current): ~22s 8kiB: ~8s 16kiB: ~4s 32kiB: ~3.5s 64kiB: ~2.8s 128kiB: ~2.6s 256kiB: ~2.1s 512kiB: ~2.1s 1MiB: ~2.1s 2MiB: ~2.1s I choose 512kiB from this list as it seems to be the place where the returns start diminishing and still isn't that much memory My understanding of how this makes such a difference is ReadFile issues a request for each call, and larger buffer means less round trip times. The "ideal" situation is ReadFile() being async and being able to issue multiple of these, but that is much more work for probably little gains. NOTE: this is my first contribution, so wasn't sure who to choose as a reviewer. Greg Clayton seems to be the most appropriate of those in CODE_OWNERS.txt Reviewed By: clayborg, jasonmolenda Differential Revision: https://reviews.llvm.org/D153060
The ConstantRange specifies the range of the scalar elements in the vector. When converting into a Constant, we need to create a vector splat with the correct type. For that purpose, pass in the expected type for the constant. Fixes llvm#63380.
This was a spurious closing parenthese.
The wrapper, as most of compiler-generated functions, are intended to serve the IR for the current module. The safest linkage is to keep these private to avoid any possible collision with other modules. Differential Revision: https://reviews.llvm.org/D153255
This reverts commit aa49521. As discussed in llvm#53475 this patch allows for using LLD-as-a-lib. It also lets clients link only the drivers that they want (see unit tests). This also adds the unit test infra as in the other LLVM projects. Among the test coverage, I've added the original issue from @krzysz00, see: https://github.com/ROCmSoftwarePlatform/D108850-lld-bug-reproduction Important note: this doesn't allow (yet) linking in parallel. This will come a bit later hopefully, in subsequent patches, for COFF at least. Differential revision: https://reviews.llvm.org/D119049
This register is used as the pointer to the current thread local storage block and is read from NT_ARM_TLS on Linux. Though tpidr will be present on all AArch64 Linux, I am soon going to add a second register tpidr2 to this set. tpidr is only present when SME is implemented, therefore the NT_ARM_TLS set will change size. This is why I've added this as a dynamic register set to save changes later. Reviewed By: omjavaid Differential Revision: https://reviews.llvm.org/D152516
Key changes: - Refactor the createTargetData function to make use of the emitOffloadingArrays and emitOffloadingArraysArgument functions to generate code. - Added a new emitIfClause helper function to allow handling if clauses in a similar fashion to Clang. - Updated the MLIR side of code to account for changes to createTargetData. Depends on D149872 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D146557
These are leftover hacks from using asm declaratios to access intrinsics.
For consistency with other algorithms. Differential Revision: https://reviews.llvm.org/D153141
A few tests were also straightforward to translate to SFINAE tests instead, so in a few cases I did that and removed the .fail.cpp test entirely. Differential Revision: https://reviews.llvm.org/D153149
The operations.cpp file contained the implementation of a ton of functionality unrelated to just the filesystem operations, and filesystem_common.h contained a lot of unrelated functionality as well. Splitting this up into more files will make it possible in the future to support parts of <filesystem> (e.g. path) on systems where there is no notion of a filesystem. Differential Revision: https://reviews.llvm.org/D152377
Implement XCVbitmanip intrinsics for CV32E40P according to the specification. This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P. Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, @simoncook, @xmj. Spec: https://github.com/openhwgroup/cv32e40p/blob/62bec66b36182215e18c9cf10f723567e23878e9/docs/source/instruction_set_extensions.rst Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D152915
The LLVM comdat operation specifies how to deduplicate globals with the same key in two different object files. This is necessary on Windows where e.g. two object files with linkonce globals will not link unless a comdat for those globals is specified. It is also supported in the ELF format. Differential Revision: https://reviews.llvm.org/D150796
…reachable SplitBlockAndInsertIfThen utility creates two new blocks, they're called ThenBlock and Tail (true and false destinations of a conditional branch correspondingly). The function has a bool parameter Unreachable, and if it's set, then ThenBlock is terminated with an unreachable. At the end of the function the new blocks are added to the loop of the split block. However, in case ThenBlock is terminated with an unreachable, it cannot belong to any loop. Differential Revision: https://reviews.llvm.org/D152434
This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies. It is activated through a new environment variable "KMP_TDG_DOT" Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D150962
After each iteration of the function specializer, constant stack values are promoted to constant globals in order to enable recursive function specialization. This should also be done once before running the specializer. Enables specialization of _QMbrute_forcePdigits_2 from SPEC2017:548.exchange2_r. Differential Revision: https://reviews.llvm.org/D152799
The option `-misched-detail-resource-booking` prints the following information every time the method `SchedBoundary::getNextResourceCycle` is invoked: 1. counters of the resources that have already been booked; 2. the values returned by `getNextResourceCycle`, which is the next available cycle in which a resource can be booked. The method is useful to debug low-level checks inside the machine scheduler that make decisions based on the values returned by `getNextResourceCycle`. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D153116
Reverting because of https://lab.llvm.org/buildbot#builders/75/builds/32485: llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:2374:7: error: use of undeclared identifier 'MischedDetailResourceBooking' if (MischedDetailResourceBooking) This reverts commit fc06262.
The option `-misched-detail-resource-booking` prints the following information every time the method `SchedBoundary::getNextResourceCycle` is invoked: 1. counters of the resources that have already been booked; 2. the values returned by `getNextResourceCycle`, which is the next available cycle in which a resource can be booked. The method is useful to debug low-level checks inside the machine scheduler that make decisions based on the values returned by `getNextResourceCycle`. Reviewed By: andreadb Differential Revision: https://reviews.llvm.org/D153116
Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D153269
Differential Revision: https://reviews.llvm.org/D153325
This is a follow-up to D151938 that should fix GCC's -Wcast-qual warning.
They cause failures on the llvm-clang-x86_64-expensive-checks-debian buildbot. This partially reverts D153269 [AMDGPU][GFX11] Add test coverage for FMA instructions.
… path Try to address part of llvm#61900. It is not completely addressed since the original reproducer is not fixed due to the final suspend point is optimized out in its special case. But that is a relatively independent issue.
Drop alignment to allow test to run in different platforms. Differential Revision: https://reviews.llvm.org/D152547
- Update the Cortex-A510 mcpu target to use A510 scheduling info instead of A55. Values taken are based on the A510 software optimisation guide https://developer.arm.com/documentation/PJDOC-466751330-536816/latest - Make latency of most integer ops to 1. CPU uarch is able to resolve most integer ops in 1 cycle Differential Revision: https://reviews.llvm.org/D152688
LLVM build system separates between `add_llvm_example_library` and `add_llvm_library`, which is presumably used to package examples separately from the regular library. Introduce a similar approach to building example libraries in MLIR and use it for the transform dialect tutorial. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D153265
This patch is a followup for D153162. It cures one more place where indexed address was incorrectly read. It also moves handling of indexed address into DWARFUnit. Differential Revision: https://reviews.llvm.org/D153297
…pes in lexical block scopes (4/7)" (2)" This reverts commit cb9ac70. It causes an assert in clang: virtual void llvm::DwarfDebug::endFunctionImpl(const llvm::MachineFunction*): Assertion `LScopes.getAbstractScopesList().size() == NumAbstractSubprograms && "getOrCreateAbstractScope() inserted an abstract subprogram scope"' failed. https://bugs.chromium.org/p/chromium/issues/detail?id=1456288#c2
This fixes a false positive where a ParamVarDecl happend to be the same name of some C standard symbol and has a global namespace. ``` using A = int(int time); // we suggest <ctime> for the `int time`. ``` Differential Revision: https://reviews.llvm.org/D153330
Add extra error checking to prevent passes from being run on unsupported ops through the pass manager infrastructure. Differential Revision: https://reviews.llvm.org/D153144
Clang provides the `-mlink-bitcode-file` and `-mlink-builtin-bitcode` options to insert LLVM-IR into the current TU. These are usefuly primarily for including LLVM-IR files that require special handling to be correct and cannot be linked normally, such as GPU vendor libraries like `libdevice.10.bc`. Currently these options can only be used if the source input goes through the AST consumer path. This patch makes the changes necessary to also support this when the input is LLVM-IR. This will allow the following operation: ``` clang in.bc -Xclang -mlink-builtin-bitcode -Xclang libdevice.10.bc ``` Reviewed By: yaxunl Differential Revision: https://reviews.llvm.org/D152391
The GPU vendors currently provide bitcode files for their device runtime. These files need to be handled specially as they are not built to be linked in with a standard `llvm-link` call or through LTO linking. This patch adds an alternative to use the existing clang handling of these libraries that does the necessary magic to make this work. We do this by causing the LTO backend to emit bitcode before running the backend. We then pass this through to clang which uses the existing support which has been fixed to support this by D152391. The backend will then be run with the merged module. This patch adds the `--builtin-bitcode=<triple>=file.bc` to specify a single file, or just `--clang-backend` to let the toolchain handle its defaults (currently nothing for NVPTX and the ROCm device libs for AMDGPU). This may have a performance impact due to running the optimizations again, we could potentially disable optimizations in LTO and only do the linking if this is an issue. This should allow us to resolve issues when relying on the `linker-wrapper` to do a late linking that may depend on vendor libraries. Depends on D152391 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D152442
Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems). When in BE8 mode: 1. Instructions are big-endian in relocatable objects but little-endian in executables and shared objects. 2. Data is big-endian. 3. The data encoding of the ELF file is ELFDATA2MSB. To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8: 1. The linker sets the flag EF_ARM_BE8 in the ELF header. 2. The linker endian reverses the instructions, but not data. This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32 including SyntheticSections then endian reverse all code in InputSections via mapping symbols. Reviewed By: peter.smith Differential Revision: https://reviews.llvm.org/D150870
Similar to the existing f32 pattern, this adds a tablegen pattern for the fp16 fcvtn2.
…` property for some intrinsic nodes
Re-order exceptional branches and slightly adjust the evaluation. Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op) Reciprocal throughputs: ``` --- BEFORE --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call; --- AFTER --- $ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call; ``` Latency: ``` --- BEFORE --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call; [####################] 100 %. (with -msse4.2) Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call; --- AFTER --- $ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf [####################] 100 % (with -mavx2 -mfma) Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call; [####################] 100 % (with -msse4.2) Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call; [####################] 100 %. (SSE2) Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153026
Re-order exceptional branches and slightly adjust the evaluation. Depends on https://reviews.llvm.org/D153026 . Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153062
Re-organize special cases and add a special case when `|x| < 2^-5`. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153134
Verifying dominator tree is expensive using intra-pass asserts. Asserts added during D147408 are increasing the build time of libc significantly. This change does the verification after the atomic optimizer pass and should fix the regression reported in D153232. Reviewed By: arsenm, #amdgpu Differential Revision: https://reviews.llvm.org/D153261
This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions. ``` int sel(int a, int b) { return (a < b) ? a : 0; } ``` ``` slt $a1,$a0,$a1 masknez $a2,$r0,$a1 maskeqz $a0,$a0,$a1 or $a0,$a0,$a2 ``` => ``` slt $a1,$a0,$a1 maskeqz $a0,$a0,$a1 ``` Reviewed By: SixWeining Differential Revision: https://reviews.llvm.org/D153193
A distinct attribute associates a referenced attribute to a unique identifier. Every call to its create function allocates a new distinct attribute instance. The address of the attribute instance temporarily serves as its unique identifier. Similar to the names of SSA values, the final unique identifiers are generated during pretty printing. Examples: #distinct = distinct[0]<42.0 : f32> #distinct1 = distinct[1]<42.0 : f32> #distinct2 = distinct[2]<array<i32: 10, 42>> This mechanism is meant to generate attributes with a unique identifier, which can be used to mark groups of operations that share a common properties such as if they are aliasing. The design of the distinct attribute ensures minimal memory footprint per distinct attribute since it only contains a reference to another attribute. All distinct attributes are stored outside of the storage uniquer in a thread local store that is part of the context. It uses one bump pointer allocator per thread to ensure distinct attributes can be created in-parallel. Differential Revision: https://reviews.llvm.org/D153360
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE Fold mov w8, llvm#56952 movk w8, #15, lsl #16 ldrb w0, [x0, x8] into add x0, x0, 1036288 ldrb w0, [x0, 3704] Only LDRBBroX is supported for the first time. Fix llvm#71917
No description provided.