`lemmy_db_views_modlog_combined` has a weird performance regression #141006

lcnr · 2025-05-14T14:10:45Z

#132064 (comment)

We recently split up our code into multiple crates for better parallel compilation. This also helped to narrow down where exactly the slowdown discussed here is happening, namely in this file. I already removed most irrelevant code, and it seems clear that the slowdown is somehow related to diesel. You can reproduce with the following commands:

git clone https://github.com/LemmyNet/lemmy.git --recursive --branch modlog-combined-remove-derives
cd lemmy
cargo build --timings -p lemmy_db_views_modlog_combined --all-features

Build timings only for lemmy_db_views_modlog_combined crate: 1.81.0: 23s 1.86.0: 188s 1.88.0-nightly (7918c7eb5 2025-04-27): 188s

I hope this will help to fix the problem, because we are stuck on 1.81 until it is resolved.

cc @weiznich: Are you aware of problems with slow compilation of diesel code starting from Rust 1.82?

In case its helpful here is profiling data for lemmy_db_views_modlog_combined using cargo 1.88.0-nightly (7918c7eb5 2025-04-27):

Item Self time % of total time Time Item count Incremental result hashingtime

check_liveness 82.00s 42.409 82.04s 43 4.60µs

mir_built 76.86s 39.752 169.03s 43 562.14µs

layout_of 16.74s 8.656 84.04s 5691 1.26ms

Originally posted by @Nutomic in #132064

The text was updated successfully, but these errors were encountered:

weiznich · 2025-05-14T14:25:57Z

Adding the content from #132064 (comment)

Steps to get these results:

Checkout git clone https://github.com/LemmyNet/lemmy.git --recursive --branch modlog-combined-remove-derives
Change to the right directory cd lemmy
Build with cargo check -p lemmy_db_views_modlog_combined --all-features (both the crate + the --all-features flag is relevant). This should get the dependencies build. For the issue only the lemmy_db_views_modlog_combined crate is relevant.
For the timing: Run the build with RUSTFLAGS="-Zself-profile" and process the output with summarize
For the samply profile: Run the profiler with [samply](https://github.com/mstange/samply] (it's literally just samply record cargo check -p lemmy_db_views_modlog_combined --all-features). That needs a compiler with debug symbols.
The linked gist contains the debug output of a self in the linked function. That output appeared ~8k times

Original comment:

These results are surprising for me as neither check_liveness nor mir_built nor layout_of are related to the trait solving at all. Given your profile these seem to account for ~90% of the build time of that particular crate.

Given these suprising results I checked out the code locally and used rustc 1.88.0-nightly (50aa04180 2025-05-08) to verify these results. I got slightly different numbers but overall the result remains the same:

+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| Item                                  | Self time | % of total time | Time     | Item count | Incremental result hashing time |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_liveness                        | 76.09s    | 45.199          | 76.13s   | 43         | 5.11µs                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| mir_built                             | 75.96s    | 45.119          | 161.82s  | 43         | 588.78µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| evaluate_obligation                   | 5.20s     | 3.087           | 5.24s    | 10254      | 1.26ms                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_match                           | 4.43s     | 2.629           | 9.68s    | 43         | 6.14µs                          |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| typeck                                | 4.10s     | 2.433           | 7.14s    | 43         | 275.40µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| check_well_formed                     | 2.05s     | 1.217           | 155.57s  | 367        | 34.51µs                         |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+
| normalize_canonicalized_projection_ty | 185.52ms  | 0.110           | 187.93ms | 943        | 260.83µs                        |
+---------------------------------------+-----------+-----------------+----------+------------+---------------------------------+

It seems like typeck is not the bottleneck here, although it still could be faster than 4s. evaluate_obligation (coherence checking) is also relatively fast with 5s. (It's still up to discussion if 4s and 5s is the best performance that the compiler could deliver there, but that's something that can wait).

Overall the main issues from my point of view are the check_liveness and the mir_built queries, as they result in >90% of the time spend compiling there. A quick search in the compiler indicates that check_liveness seems to be a pass that removes dead code and mir_built seems to be the query pass that builds MIR. Given that the crate in question only has ~500 lines of code (1700 lines if all macros are expanded) I wouldn't expect both passes to take more than a minute, but rather several orders of magnitude less time.

I attach both the recorded mm_profdata file and the chrome profile to the post so that you can checkout that easily on your own.

profile.zip

For me that all indicates:

The new trait solver won't fix that issue as it's not at all related to trait solving
My personal take away from this is that developers (including me) often see diesel + slow in some issue and automatically assume that it is always typeck that's causing this issue. So for the future: Better verify that typeck is really the issue rather than just assuming it and spend arguing about "non-issues".

EDIT:

I've uploaded a samply profile here: https://share.firefox.dev/4kefIUS. This reports that 87% of the time is spend in this function:

rust/compiler/rustc_middle/src/ty/inhabitedness/inhabited_predicate.rs

Line 82 in c6faa1b

fn apply_inner<E: std::fmt::Debug>(

EDIT 2:

I dug a bit further and it seems like most of the time is spend here on normalizing that generic query type again and again and again. https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/ty/inhabitedness/inhabited_predicate.rs#L102
For reference the input to that function is something like https://gist.github.com/weiznich/b17704681fc78d2e12e225416c0a6658 (there are several thousand variants that differ in the number of And( and Or( wrappers in the beginning.)

I'm not sure if it would be possible to cache that so the compiler only needs to do that once and not repeat it thousand times? As I don't know anything about that part of the compiler nor do I have the capacity to fix that on my own there is not much more I can do here. From my perspective that still looks like a rather critical regression that was possibly introduced while the fix for the other issue was done and therefore went unnoticed for quite some time. (Or better the fallout was attribute to the other issue)

lqd · 2025-05-14T23:25:15Z

@lcnr To me this looks like #132064, which I've bisected to #126024, and was improved by #132625 as an alternative to the possible revert in #132075. Only I guess that this looks worse with the new lemmy code (and/or --all-features) it seems.

For this updated example, lemmy_db_views_modlog_combined cargo checks:

in 16s with d24930c, Do not use global caches if opaque types can be defined #126024's parent
in 12min 20s with 2ccafed, Do not use global caches if opaque types can be defined #126024.
in 4min 30s with 1.83.0, which includes Only disable cache if predicate has opaques within it #132625. There are no try build artifacts available for it anymore so 1.83 was a fallback.
in 4 min 52s on 1.86.0 and nightly

Do you think we need a new issue to track this?

(#140944 likely is another duplicate in practice, however that one can be worked around differently.)

weiznich · 2025-05-15T06:22:38Z

@lqd Thanks for looking again into this. As written above I did some profiling there last week. For me these results (especially the linked samply profile) look like it's the uninhabitedness check here as well. So this looks suspiciously like #140944, which I wasn't aware of. Maybe you can confirm that by taking a look at the linked information?

I believe what happened here is that another change landed while the original regression was in place. The later regression was never noticed as it was always attributed to #132064.

lqd · 2025-05-15T07:09:19Z

@weiznich are you at the workshops / pre all hands / all hands?

It’ll be easier to talk IRL and show you the numbers, I think some of your conclusions are incomplete, and I’d also like to find out if eg you’re talking about the 20s "regression" compared to the 260s regression.

But basically to me the trait solver caching regression was mitigated but turned into a different trait solver slowness (inhabitedness) for some big types; maybe caching that, turning into a query, etc would help.

lcnr · 2025-05-15T11:00:26Z

This has been caused by #128934 and #122792.

Checking whether these large types are inhabited is incredibly involved. To fix this we'd likely have to change the implementation of inhabitness checking to have significantly stronger caching. Implementation such a cache will be challenging. It is really unfortunate that this regression was hidden by the opaque types regression and was not detected during the stabilization of these two features.

move expensive layout sanity check to debug assertions It is [hard to fix](rust-lang#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in rust-lang#140944. So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`. It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from ``` +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | Item | Self time | % of total time | Time | Item count | Incremental result hashing time | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | layout_of | 63.02s | 41.895 | 244.26s | 123703 | 50.30ms | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ ``` on master (2m17s total), to ``` | layout_of | 330.21ms | 0.372 | 26.90s | 123703 | 53.19ms | ``` with this PR (1m15s total). (Note that the [perf run results](rust-lang#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit) Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`

weiznich · 2025-05-16T16:57:05Z

As @lqd suggested yesterday that this might be related to #[non_exhaustive] attributes in diesel I did an experiment to check if compile times improve if I locally remove all these attributes and use that modified diesel version as dependency instead.
For me that reduced the check time for lemmy_db_views_modlog_combined from 2m 43s to 2m 33s, so that's only a minimal improvement.

Otherwise I would like to thank you all for looking into this in that detail.

move expensive layout sanity check to debug assertions It is [hard to fix](rust-lang#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in rust-lang#140944. So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`. It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from ``` +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | Item | Self time | % of total time | Time | Item count | Incremental result hashing time | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | layout_of | 63.02s | 41.895 | 244.26s | 123703 | 50.30ms | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ ``` on master (2m17s total), to ``` | layout_of | 330.21ms | 0.372 | 26.90s | 123703 | 53.19ms | ``` with this PR (1m15s total). (Note that the [perf run results](rust-lang#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit) Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`

move expensive layout sanity check to debug assertions It is [hard to fix](rust-lang/rust#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in #140944. So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`. It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from ``` +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | Item | Self time | % of total time | Time | Item count | Incremental result hashing time | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | layout_of | 63.02s | 41.895 | 244.26s | 123703 | 50.30ms | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ ``` on master (2m17s total), to ``` | layout_of | 330.21ms | 0.372 | 26.90s | 123703 | 53.19ms | ``` with this PR (1m15s total). (Note that the [perf run results](rust-lang/rust#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit) Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 14, 2025

rustbot added the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label May 14, 2025

jieyouxu added E-needs-bisection Call for participation: This issue needs bisection: https://github.com/rust-lang/cargo-bisect-rustc E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels May 14, 2025

lcnr mentioned this issue May 15, 2025

regression: crate compiles much slower with 1.82 #132064

Open

jieyouxu added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels May 15, 2025

lqd mentioned this issue May 15, 2025

move expensive layout sanity check to debug assertions #141039

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`lemmy_db_views_modlog_combined` has a weird performance regression #141006

`lemmy_db_views_modlog_combined` has a weird performance regression #141006

lcnr commented May 14, 2025 •

edited

Loading

weiznich commented May 14, 2025 •

edited by lcnr

Loading

lqd commented May 14, 2025

weiznich commented May 15, 2025

lqd commented May 15, 2025 •

edited

Loading

lcnr commented May 15, 2025

weiznich commented May 16, 2025

lemmy_db_views_modlog_combined has a weird performance regression #141006

lemmy_db_views_modlog_combined has a weird performance regression #141006

Comments

lcnr commented May 14, 2025 • edited Loading

weiznich commented May 14, 2025 • edited by lcnr Loading

lqd commented May 14, 2025

weiznich commented May 15, 2025

lqd commented May 15, 2025 • edited Loading

lcnr commented May 15, 2025

weiznich commented May 16, 2025

`lemmy_db_views_modlog_combined` has a weird performance regression #141006

`lemmy_db_views_modlog_combined` has a weird performance regression #141006

lcnr commented May 14, 2025 •

edited

Loading

weiznich commented May 14, 2025 •

edited by lcnr

Loading

lqd commented May 15, 2025 •

edited

Loading