Skip to content

lemmy_db_views_modlog_combined has a weird performance regression #141006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lcnr opened this issue May 14, 2025 · 6 comments
Open

lemmy_db_views_modlog_combined has a weird performance regression #141006

lcnr opened this issue May 14, 2025 · 6 comments
Labels
C-bug Category: This is a bug. E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-compiletime Issue: Problems and improvements with respect to compile times. I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@lcnr
Copy link
Contributor

lcnr commented May 14, 2025

#132064 (comment)

We recently split up our code into multiple crates for better parallel compilation. This also helped to narrow down where exactly the slowdown discussed here is happening, namely in this file. I already removed most irrelevant code, and it seems clear that the slowdown is somehow related to diesel. You can reproduce with the following commands:

git clone https://github.com/LemmyNet/lemmy.git --recursive --branch modlog-combined-remove-derives
cd lemmy
cargo build --timings -p lemmy_db_views_modlog_combined --all-features

Build timings only for lemmy_db_views_modlog_combined crate: 1.81.0: 23s 1.86.0: 188s 1.88.0-nightly (7918c7eb5 2025-04-27): 188s

I hope this will help to fix the problem, because we are stuck on 1.81 until it is resolved.

cc @weiznich: Are you aware of problems with slow compilation of diesel code starting from Rust 1.82?

In case its helpful here is profiling data for lemmy_db_views_modlog_combined using cargo 1.88.0-nightly (7918c7eb5 2025-04-27):

Item Self time % of total time Time Item count Incremental result hashingtime
check_liveness 82.00s 42.409 82.04s 43 4.60µs
mir_built 76.86s 39.752 169.03s 43 562.14µs
layout_of 16.74s 8.656 84.04s 5691 1.26ms

Originally posted by @Nutomic in #132064

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 14, 2025
@weiznich
Copy link
Contributor

weiznich commented May 14, 2025

Adding the content from #132064 (comment)

Steps to get these results:

  • Checkout git clone https://github.com/LemmyNet/lemmy.git --recursive --branch modlog-combined-remove-derives
  • Change to the right directory cd lemmy
  • Build with cargo check -p lemmy_db_views_modlog_combined --all-features (both the crate + the --all-features flag is relevant). This should get the dependencies build. For the issue only the lemmy_db_views_modlog_combined crate is relevant.
  • For the timing: Run the build with RUSTFLAGS="-Zself-profile" and process the output with summarize
  • For the samply profile: Run the profiler with [samply](https://github.com/mstange/samply] (it's literally just samply record cargo check -p lemmy_db_views_modlog_combined --all-features). That needs a compiler with debug symbols.
  • The linked gist contains the debug output of a self in the linked function. That output appeared ~8k times

Original comment:


These results are surprising for me as neither check_liveness nor mir_built nor layout_of are related to the trait solving at all. Given your profile these seem to account for ~90% of the build time of that particular crate.

Given these suprising results I checked out the code locally and used rustc 1.88.0-nightly (50aa04180 2025-05-08) to verify these results. I got slightly different numbers but overall the result remains the same:

+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                  | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_liveness                        | 76.09s    | 45.199          | 76.13s   | 43         | 5.11µs                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_built                             | 75.96s    | 45.119          | 161.82s  | 43         | 588.78µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| evaluate_obligation                   | 5.20s     | 3.087           | 5.24s    | 10254      | 1.26ms                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_match                           | 4.43s     | 2.629           | 9.68s    | 43         | 6.14µs                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| typeck                                | 4.10s     | 2.433           | 7.14s    | 43         | 275.40µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_well_formed                     | 2.05s     | 1.217           | 155.57s  | 367        | 34.51µs                         |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| normalize_canonicalized_projection_ty | 185.52ms  | 0.110           | 187.93ms | 943        | 260.83µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+

It seems like typeck is not the bottleneck here, although it still could be faster than 4s. evaluate_obligation (coherence checking) is also relatively fast with 5s. (It's still up to discussion if 4s and 5s is the best performance that the compiler could deliver there, but that's something that can wait).

Overall the main issues from my point of view are the check_liveness and the mir_built queries, as they result in >90% of the time spend compiling there. A quick search in the compiler indicates that check_liveness seems to be a pass that removes dead code and mir_built seems to be the query pass that builds MIR. Given that the crate in question only has ~500 lines of code (1700 lines if all macros are expanded) I wouldn't expect both passes to take more than a minute, but rather several orders of magnitude less time.

I attach both the recorded mm_profdata file and the chrome profile to the post so that you can checkout that easily on your own.

profile.zip

For me that all indicates:

  • The new trait solver won't fix that issue as it's not at all related to trait solving
  • My personal take away from this is that developers (including me) often see diesel + slow in some issue and automatically assume that it is always typeck that's causing this issue. So for the future: Better verify that typeck is really the issue rather than just assuming it and spend arguing about "non-issues".

EDIT:

I've uploaded a samply profile here: https://share.firefox.dev/4kefIUS. This reports that 87% of the time is spend in this function:

EDIT 2:

I dug a bit further and it seems like most of the time is spend here on normalizing that generic query type again and again and again. https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/ty/inhabitedness/inhabited_predicate.rs#L102
For reference the input to that function is something like https://gist.github.com/weiznich/b17704681fc78d2e12e225416c0a6658 (there are several thousand variants that differ in the number of And( and Or( wrappers in the beginning.)

I'm not sure if it would be possible to cache that so the compiler only needs to do that once and not repeat it thousand times? As I don't know anything about that part of the compiler nor do I have the capacity to fix that on my own there is not much more I can do here. From my perspective that still looks like a rather critical regression that was possibly introduced while the fix for the other issue was done and therefore went unnoticed for quite some time. (Or better the fallout was attribute to the other issue)

@jieyouxu jieyouxu added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 14, 2025
@rustbot rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label May 14, 2025
@jieyouxu jieyouxu added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 14, 2025
@lqd
Copy link
Member

lqd commented May 14, 2025

@lcnr To me this looks like #132064, which I've bisected to #126024, and was improved by #132625 as an alternative to the possible revert in #132075. Only I guess that this looks worse with the new lemmy code (and/or --all-features) it seems.

For this updated example, lemmy_db_views_modlog_combined cargo checks:

Do you think we need a new issue to track this?

(#140944 likely is another duplicate in practice, however that one can be worked around differently.)

@weiznich
Copy link
Contributor

@lqd Thanks for looking again into this. As written above I did some profiling there last week. For me these results (especially the linked samply profile) look like it's the uninhabitedness check here as well. So this looks suspiciously like #140944, which I wasn't aware of. Maybe you can confirm that by taking a look at the linked information?

I believe what happened here is that another change landed while the original regression was in place. The later regression was never noticed as it was always attributed to #132064.

@lqd
Copy link
Member

lqd commented May 15, 2025

@weiznich are you at the workshops / pre all hands / all hands?

It’ll be easier to talk IRL and show you the numbers, I think some of your conclusions are incomplete, and I’d also like to find out if eg you’re talking about the 20s "regression" compared to the 260s regression.

But basically to me the trait solver caching regression was mitigated but turned into a different trait solver slowness (inhabitedness) for some big types; maybe caching that, turning into a query, etc would help.

@jieyouxu jieyouxu added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels May 15, 2025
@lcnr
Copy link
Contributor Author

lcnr commented May 15, 2025

This has been caused by #128934 and #122792.

Checking whether these large types are inhabited is incredibly involved. To fix this we'd likely have to change the implementation of inhabitness checking to have significantly stronger caching. Implementation such a cache will be challenging. It is really unfortunate that this regression was hidden by the opaque types regression and was not detected during the stabilization of these two features.

bors added a commit to rust-lang-ci/rust that referenced this issue May 15, 2025
move expensive layout sanity check to debug assertions

It is [hard to fix](rust-lang#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in rust-lang#140944.

So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`.

It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from

```
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                                   | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| layout_of                                              | 63.02s    | 41.895          | 244.26s  | 123703     | 50.30ms                         |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
```

on master (2m17s total), to

```
| layout_of                                              | 330.21ms  | 0.372           | 26.90s   | 123703     | 53.19ms                         |
```

with this PR (1m15s total).

(Note that the [perf run results](rust-lang#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit)

Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
@weiznich
Copy link
Contributor

As @lqd suggested yesterday that this might be related to #[non_exhaustive] attributes in diesel I did an experiment to check if compile times improve if I locally remove all these attributes and use that modified diesel version as dependency instead.
For me that reduced the check time for lemmy_db_views_modlog_combined from 2m 43s to 2m 33s, so that's only a minimal improvement.

Otherwise I would like to thank you all for looking into this in that detail.

bors added a commit to rust-lang-ci/rust that referenced this issue May 18, 2025
move expensive layout sanity check to debug assertions

It is [hard to fix](rust-lang#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in rust-lang#140944.

So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`.

It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from

```
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                                   | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| layout_of                                              | 63.02s    | 41.895          | 244.26s  | 123703     | 50.30ms                         |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
```

on master (2m17s total), to

```
| layout_of                                              | 330.21ms  | 0.372           | 26.90s   | 123703     | 53.19ms                         |
```

with this PR (1m15s total).

(Note that the [perf run results](rust-lang#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit)

Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
github-actions bot pushed a commit to rust-lang/miri that referenced this issue May 18, 2025
move expensive layout sanity check to debug assertions

It is [hard to fix](rust-lang/rust#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in #140944.

So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`.

It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from

```
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                                   | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| layout_of                                              | 63.02s    | 41.895          | 244.26s  | 123703     | 50.30ms                         |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
```

on master (2m17s total), to

```
| layout_of                                              | 330.21ms  | 0.372           | 26.90s   | 123703     | 53.19ms                         |
```

with this PR (1m15s total).

(Note that the [perf run results](rust-lang/rust#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit)

Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
github-actions bot pushed a commit to rust-lang/rustc-dev-guide that referenced this issue May 19, 2025
move expensive layout sanity check to debug assertions

It is [hard to fix](rust-lang/rust#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in #140944.

So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`.

It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from

```
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                                   | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| layout_of                                              | 63.02s    | 41.895          | 244.26s  | 123703     | 50.30ms                         |
+--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
```

on master (2m17s total), to

```
| layout_of                                              | 330.21ms  | 0.372           | 26.90s   | 123703     | 53.19ms                         |
```

with this PR (1m15s total).

(Note that the [perf run results](rust-lang/rust#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit)

Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example I-compiletime Issue: Problems and improvements with respect to compile times. I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants