Skip to content

Distinct Attributes. #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 130 commits into
base: main
Choose a base branch
from
Open

Distinct Attributes. #16

wants to merge 130 commits into from

Conversation

gysit
Copy link
Collaborator

@gysit gysit commented Jun 16, 2023

No description provided.

Copy link
Collaborator

@Dinistro Dinistro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped a bunch of comments

@@ -149,12 +149,14 @@ class AbstractAttribute {

namespace detail {
class AttributeUniquer;
class DistinctAttrUniquer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a somewhat schizophrenic name, but it really is a replacement for the AttributeUniquer

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah better suggestions are welcome!

Note that "logically" distinct attributes are uniqued. We just do not store the guid.


/// Allocates a value type instance for the current thread.
template <typename... Args>
ValueT *create(Args &&...args) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive templating :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well here I was considering if the ValueT should be a create function template or a class template?

I went for putting it on the class since that is a better fit for the use case. I want people to only create DistinctAttrs with the distinctAttrStore.

parseToken(Token::less, "expected '<' after distinct id"))
return {};
Attribute referencedAttr = parseAttribute(type);
if (!referencedAttr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure if we always want to re-parse the referenced element. For now it should be fine, though. Changing it seems simple enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it is a good idea to postpone that discussion to when the revision is actually public. I want to start with the simple solution :).

@gysit gysit force-pushed the get-distinct branch 2 times, most recently from b7e3851 to 5fc5516 Compare June 19, 2023 06:48
tblah and others added 3 commits June 19, 2023 09:09
When the ENTRY statement is used, the same source can return different
types depending on the entry point. These different return values are
storage associated (share the same storage). Previously, this led to the
declaration of the results to all have the largest type. This patch adds
a convert between the stack allocation and the declaration so that the
hlfir.decl gets the right type.

I haven't managed to generate code where this convert converted a
reference to an allocation for a smaller type into an allocation for a
larger one, but I have added an assert just in case.

This is a different solution to https://reviews.llvm.org/D152725, see
discussion there.

Differential Revision: https://reviews.llvm.org/D152931
Codegen only supports conversions between logicals and integers. The
verifier should reflect this.

Differential Revision: https://reviews.llvm.org/D152935
Adds a new HLFIR operation for the COUNT intrinsic according to
the design set out in flang/docs/HighLevel.md. This patch includes all
the necessary changes to create a new HLFIR operation and lower it into
the fir runtime call.

Author was @jacob-crawley. Minor adjustments by @tblah

Differential Revision: https://reviews.llvm.org/D152521
nikic and others added 22 commits June 19, 2023 11:49
Fold uadd.sat(X, Y) uge X and usub.sat(X, Y) ule X to true.

Proof: https://alive2.llvm.org/ce/z/596m9X

Fixes llvm#63381.
Allows constant folding of such instructions when estimating user bonus.

Differential Revision: https://reviews.llvm.org/D153036
…c speed

llvm#62750

I setup a simple test with a large .so (~100MiB) that was only present on the target machine
but not present on the local machine, and ran a lldb server on the target and connectd to it.

LLDB properly downloads the file from the remote, but it does so at a very slow speed, even over a hardwired 1Gbps connection!

Increasing the buffer size for downloading these helps quite a bit.

Test setup:

```
$ cat gen.py
print('const char* hugeglobal = ')

for _ in range(1000*500):
    print('  "' + '1234'*50 + '"')

print(';')
print('const char* mystring() { return hugeglobal; }')
$ gen.py > huge.c
$ mkdir libdir
$ gcc -fPIC huge.c -Wl,-soname,libhuge.so -o libdir/libhuge.so -shared
$ cat test.c
#include <string.h>
#include <stdio.h>
extern const char* mystring();
int main() {
        printf("%d\n", strlen(mystring()));
}
$ gcc test.c -L libdir -l huge -Wl,-rpath='$ORIGIN' -o test
$ rsync -a libdir remote:~/
$ ssh remote bash -c "cd ~/libdir && /llvm/buildr/bin/lldb-server platform --server --listen '*:1234'"
```

in another terminal

```
$ rm -rf ~/.lldb # clear cache
$ cat connect.lldb
platform select remote-linux
platform connect connect://10.0.0.14:1234
file test
b main
r
image list
c
q
$ time /llvm/buildr/bin/lldb --source connect.lldb
```

Times with various buffer sizes:

1kiB (current): ~22s
8kiB: ~8s
16kiB: ~4s
32kiB: ~3.5s
64kiB: ~2.8s
128kiB: ~2.6s
256kiB: ~2.1s
512kiB: ~2.1s
1MiB: ~2.1s
2MiB: ~2.1s

I choose 512kiB from this list as it seems to be the place where the returns start diminishing and still isn't that much memory

My  understanding of how this makes such a difference is ReadFile issues a request for each call, and larger buffer means less round trip times. The "ideal" situation is ReadFile() being async and being able to issue multiple of these, but that is much more work for probably little gains.

NOTE: this is my first contribution, so wasn't sure who to choose as a reviewer. Greg Clayton seems to be the most appropriate of those in CODE_OWNERS.txt

Reviewed By: clayborg, jasonmolenda

Differential Revision: https://reviews.llvm.org/D153060
The ConstantRange specifies the range of the scalar elements in the
vector. When converting into a Constant, we need to create a vector
splat with the correct type. For that purpose, pass in the expected
type for the constant.

Fixes llvm#63380.
The wrapper, as most of compiler-generated functions, are intended to serve the
IR for the current module. The safest linkage is to keep these private to avoid
any possible collision with other modules.

Differential Revision: https://reviews.llvm.org/D153255
This reverts commit aa49521.

As discussed in llvm#53475 this patch
allows for using LLD-as-a-lib. It also lets clients link only the drivers that
they want (see unit tests).

This also adds the unit test infra as in the other LLVM projects. Among the
test coverage, I've added the original issue from @krzysz00, see:
https://github.com/ROCmSoftwarePlatform/D108850-lld-bug-reproduction

Important note: this doesn't allow (yet) linking in parallel. This will come a
bit later hopefully, in subsequent patches, for COFF at least.

Differential revision: https://reviews.llvm.org/D119049
This register is used as the pointer to the current thread
local storage block and is read from NT_ARM_TLS on Linux.

Though tpidr will be present on all AArch64 Linux, I am soon
going to add a second register tpidr2 to this set.

tpidr is only present when SME is implemented, therefore the
NT_ARM_TLS set will change size. This is why I've added this
as a dynamic register set to save changes later.

Reviewed By: omjavaid

Differential Revision: https://reviews.llvm.org/D152516
Key changes:
  - Refactor the createTargetData function to make use of the emitOffloadingArrays and emitOffloadingArraysArgument functions to generate code.
  - Added a new emitIfClause helper function to allow handling if clauses in a similar fashion to Clang.
  - Updated the MLIR side of code to account for changes to createTargetData.

Depends on D149872

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D146557
These are leftover hacks from using asm declaratios to access
intrinsics.
For consistency with other algorithms.

Differential Revision: https://reviews.llvm.org/D153141
A few tests were also straightforward to translate to SFINAE tests
instead, so in a few cases I did that and removed the .fail.cpp test
entirely.

Differential Revision: https://reviews.llvm.org/D153149
The operations.cpp file contained the implementation of a ton of
functionality unrelated to just the filesystem operations, and
filesystem_common.h contained a lot of unrelated functionality as well.

Splitting this up into more files will make it possible in the future
to support parts of <filesystem> (e.g. path) on systems where there is
no notion of a filesystem.

Differential Revision: https://reviews.llvm.org/D152377
Implement XCVbitmanip intrinsics for CV32E40P according to the specification.

This commit is part of a patch-set to upstream the 7 vendor specific extensions of CV32E40P.

Contributors: @CharKeaney, @jeremybennett, @lewis-revill, @liaolucy, @simoncook, @xmj.

Spec: https://github.com/openhwgroup/cv32e40p/blob/62bec66b36182215e18c9cf10f723567e23878e9/docs/source/instruction_set_extensions.rst

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D152915
The LLVM comdat operation specifies how to deduplicate globals with the
same key in two different object files. This is necessary on Windows
where e.g. two object files with linkonce globals will not link unless
a comdat for those globals is specified. It is also supported in the ELF
format.

Differential Revision: https://reviews.llvm.org/D150796
…reachable

SplitBlockAndInsertIfThen utility creates two new blocks,
they're called ThenBlock and Tail (true and false destinations of a conditional
branch correspondingly). The function has a bool parameter Unreachable,
and if it's set, then ThenBlock is terminated with an unreachable.
At the end of the function the new blocks are added to the loop of the split
block. However, in case ThenBlock is terminated with an unreachable,
it cannot belong to any loop.

Differential Revision: https://reviews.llvm.org/D152434
This patch implements the "__kmp_print_tdg_dot" function, that prints a task dependency graph into a dot file containing the tasks and their dependencies.

It is activated through a new environment variable "KMP_TDG_DOT"

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D150962
After each iteration of the function specializer, constant stack values
are promoted to constant globals in order to enable recursive function
specialization. This should also be done once before running the
specializer. Enables specialization of _QMbrute_forcePdigits_2 from
SPEC2017:548.exchange2_r.

Differential Revision: https://reviews.llvm.org/D152799
fpetrogalli and others added 26 commits June 20, 2023 11:13
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:

1. counters of the resources that have already been booked;

2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.

The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D153116
Reverting because of https://lab.llvm.org/buildbot#builders/75/builds/32485:

llvm-project/llvm/lib/CodeGen/MachineScheduler.cpp:2374:7: error: use of undeclared identifier 'MischedDetailResourceBooking'
 if (MischedDetailResourceBooking)

This reverts commit fc06262.
The option `-misched-detail-resource-booking` prints the following
information every time the method
`SchedBoundary::getNextResourceCycle` is invoked:

1. counters of the resources that have already been booked;

2. the values returned by `getNextResourceCycle`, which is the next
available cycle in which a resource can be booked.

The method is useful to debug low-level checks inside the machine
scheduler that make decisions based on the values returned by
`getNextResourceCycle`.

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D153116
This is a follow-up to D151938 that should fix GCC's -Wcast-qual warning.
They cause failures on the llvm-clang-x86_64-expensive-checks-debian
buildbot.

This partially reverts
D153269 [AMDGPU][GFX11] Add test coverage for FMA instructions.
… path

Try to address part of
llvm#61900.

It is not completely addressed since the original reproducer is not
fixed due to the final suspend point is optimized out in its special
case. But that is a relatively independent issue.
Drop alignment to allow test to run in different platforms.

Differential Revision: https://reviews.llvm.org/D152547
- Update the Cortex-A510 mcpu target to use A510 scheduling info instead of
  A55. Values taken are based on the A510 software optimisation guide
  https://developer.arm.com/documentation/PJDOC-466751330-536816/latest
- Make latency of most integer ops to 1. CPU uarch is able to resolve most
  integer ops in 1 cycle

Differential Revision: https://reviews.llvm.org/D152688
LLVM build system separates between `add_llvm_example_library` and
`add_llvm_library`, which is presumably used to package examples
separately from the regular library. Introduce a similar approach to
building example libraries in MLIR and use it for the transform dialect
tutorial.

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D153265
This patch is a followup for D153162. It cures one more place
where indexed address was incorrectly read. It also moves handling
of indexed address into DWARFUnit.

Differential Revision: https://reviews.llvm.org/D153297
…pes in lexical block scopes (4/7)" (2)"

This reverts commit cb9ac70.
It causes an assert in clang:
virtual void llvm::DwarfDebug::endFunctionImpl(const llvm::MachineFunction*): Assertion `LScopes.getAbstractScopesList().size() == NumAbstractSubprograms && "getOrCreateAbstractScope() inserted an abstract subprogram scope"' failed.
https://bugs.chromium.org/p/chromium/issues/detail?id=1456288#c2
This fixes a false positive where a ParamVarDecl happend to be the
same name of some C standard symbol and has a global namespace.

```
using A = int(int time); // we suggest <ctime> for the `int time`.
```

Differential Revision: https://reviews.llvm.org/D153330
Add extra error checking to prevent passes from being run on unsupported ops through the pass manager infrastructure.

Differential Revision: https://reviews.llvm.org/D153144
Clang provides the `-mlink-bitcode-file` and `-mlink-builtin-bitcode`
options to insert LLVM-IR into the current TU. These are usefuly
primarily for including LLVM-IR files that require special handling to
be correct and cannot be linked normally, such as GPU vendor libraries
like `libdevice.10.bc`. Currently these options can only be used if the
source input goes through the AST consumer path. This patch makes the
changes necessary to also support this when the input is LLVM-IR. This
will allow the following operation:

```
clang in.bc -Xclang -mlink-builtin-bitcode -Xclang libdevice.10.bc
```

Reviewed By: yaxunl

Differential Revision: https://reviews.llvm.org/D152391
The GPU vendors currently provide bitcode files for their device
runtime. These files need to be handled specially as they are not built
to be linked in with a standard `llvm-link` call or through LTO linking.
This patch adds an alternative to use the existing clang handling of
these libraries that does the necessary magic to make this work.

We do this by causing the LTO backend to emit bitcode before running the
backend. We then pass this through to clang which uses the existing
support which has been fixed to support this by D152391. The backend
will then be run with the merged module.

This patch adds the `--builtin-bitcode=<triple>=file.bc` to specify a single
file, or just `--clang-backend` to let the toolchain handle its defaults
(currently nothing for NVPTX and the ROCm device libs for AMDGPU). This may have
a performance impact due to running the optimizations again, we could
potentially disable optimizations in LTO and only do the linking if this is an
issue.

This should allow us to resolve issues when relying on the `linker-wrapper` to
do a late linking that may depend on vendor libraries.

Depends on D152391

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D152442
Arm has BE8 big endian configuration called a byte-invariant(every byte has the same address on little and big-endian systems).

When in BE8 mode:
  1. Instructions are big-endian in relocatable objects but
     little-endian in executables and shared objects.
  2. Data is big-endian.
  3. The data encoding of the ELF file is ELFDATA2MSB.

To support BE8 without an ABI break for relocatable objects,the linker takes on the responsibility of changing the endianness of instructions. At a high level the only difference between BE32 and BE8 in the linker is that for BE8:
  1. The linker sets the flag EF_ARM_BE8 in the ELF header.
  2. The linker endian reverses the instructions, but not data.

This patch adds BE8 big endian support for Arm. To endian reverse the instructions we'll need access to the mapping symbols. Code sections can contain a mix of Arm, Thumb and literal data. We need to endian reverse Arm instructions as words, Thumb instructions
as half-words and ignore literal data.The only way to find these transitions precisely is by using mapping symbols. The instruction reversal will need to take place after relocation. For Arm BE8 code sections (Section has SHF_EXECINSTR flag ) we inserted a step after relocation to endian reverse the instructions. The implementation strategy i have used here is to write all sections BE32  including SyntheticSections then endian reverse all code in InputSections via mapping symbols.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D150870
Similar to the existing f32 pattern, this adds a tablegen pattern for the fp16
fcvtn2.
Re-order exceptional branches and slightly adjust the evaluation.

Performance tested with the CORE-MATH project on AMD EPYC 7B12 (clocks/op)

Reciprocal throughputs:
```
--- BEFORE ---

$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 7.794 + 0.102 clc/call; Median-Min = 0.066 clc/call; Max = 8.267 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 10.783 + 0.172 clc/call; Median-Min = 0.144 clc/call; Max = 11.446 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 18.926 + 0.381 clc/call; Median-Min = 0.342 clc/call; Max = 19.623 clc/call;

--- AFTER ---

$ CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 6.598 + 0.085 clc/call; Median-Min = 0.052 clc/call; Max = 6.868 clc/call;
[####################] 100 %  (with -msse4.2)
Ntrial = 20 ; Min = 9.245 + 0.304 clc/call; Median-Min = 0.248 clc/call; Max = 10.675 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 11.724 + 0.440 clc/call; Median-Min = 0.444 clc/call; Max = 12.262 clc/call;
```

Latency:
```
--- BEFORE ---

$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 38.821 + 0.157 clc/call; Median-Min = 0.122 clc/call; Max = 39.539 clc/call;
[####################] 100 %. (with -msse4.2)
Ntrial = 20 ; Min = 44.767 + 0.766 clc/call; Median-Min = 0.681 clc/call; Max = 45.951 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.055 + 1.512 clc/call; Median-Min = 1.571 clc/call; Max = 57.039 clc/call;

--- AFTER ---

$ PERF_ARGS="--latency" CORE_MATH_PERF_MODE=rdtsc ./perf.sh tanhf
[####################] 100 %  (with -mavx2 -mfma)
Ntrial = 20 ; Min = 36.147 + 0.194 clc/call; Median-Min = 0.181 clc/call; Max = 36.536 clc/call;
[####################] 100 %  (with -msse4.2)
Ntrial = 20 ; Min = 40.904 + 0.728 clc/call; Median-Min = 0.557 clc/call; Max = 42.231 clc/call;
[####################] 100 %. (SSE2)
Ntrial = 20 ; Min = 55.776 + 0.557 clc/call; Median-Min = 0.542 clc/call; Max = 56.551 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153026
Re-order exceptional branches and slightly adjust the evaluation.
Depends on https://reviews.llvm.org/D153026 .

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153062
Re-organize special cases and add a special case when `|x| < 2^-5`.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153134
Verifying dominator tree is expensive using intra-pass
asserts. Asserts added during D147408 are
increasing the build time of libc significantly. This change
does the verification after the atomic optimizer pass
and should fix the regression reported in D153232.

Reviewed By: arsenm, #amdgpu

Differential Revision: https://reviews.llvm.org/D153261
This patch optimizes code generation by leveraging the zeroing behavior of the `maskeqz`/`masknez` instructions.

```
int sel(int a, int b)
{
    return (a < b) ? a : 0;
}
```

```
slt	$a1,$a0,$a1
masknez	$a2,$r0,$a1
maskeqz	$a0,$a0,$a1
or	$a0,$a0,$a2
```

=>

```
slt	$a1,$a0,$a1
maskeqz	$a0,$a0,$a1
```

Reviewed By: SixWeining

Differential Revision: https://reviews.llvm.org/D153193
A distinct attribute associates a referenced attribute to a unique
identifier. Every call to its create function allocates a new
distinct attribute instance. The address of the attribute instance
temporarily serves as its unique identifier. Similar to the names
of SSA values, the final unique identifiers are generated during
pretty printing.

Examples:
 #distinct = distinct[0]<42.0 : f32>
 #distinct1 = distinct[1]<42.0 : f32>
 #distinct2 = distinct[2]<array<i32: 10, 42>>

This mechanism is meant to generate attributes with a unique
identifier, which can be used to mark groups of operations
that share a common properties such as if they are aliasing.

The design of the distinct attribute ensures minimal memory
footprint per distinct attribute since it only contains a reference
to another attribute. All distinct attributes are stored outside of
the storage uniquer in a thread local store that is part of the
context. It uses one bump pointer allocator per thread to ensure
distinct attributes can be created in-parallel.

Differential Revision: https://reviews.llvm.org/D153360
definelicht pushed a commit that referenced this pull request Jan 5, 2024
A case for this transformation, https://gcc.godbolt.org/z/nhYcWq1WE
Fold
  mov     w8, llvm#56952
  movk    w8, #15, lsl #16
  ldrb    w0, [x0, x8]
into
  add     x0, x0, 1036288
  ldrb    w0, [x0, 3704]

Only LDRBBroX is supported for the first time.
Fix llvm#71917
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.