-
Notifications
You must be signed in to change notification settings - Fork 13.3k
lemmy_db_views_modlog_combined
has a weird performance regression
#141006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Adding the content from #132064 (comment) Steps to get these results:
Original comment: These results are surprising for me as neither Given these suprising results I checked out the code locally and used
It seems like Overall the main issues from my point of view are the I attach both the recorded For me that all indicates:
EDIT: I've uploaded a samply profile here: https://share.firefox.dev/4kefIUS. This reports that 87% of the time is spend in this function:
EDIT 2: I dug a bit further and it seems like most of the time is spend here on normalizing that generic query type again and again and again. https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/ty/inhabitedness/inhabited_predicate.rs#L102 I'm not sure if it would be possible to cache that so the compiler only needs to do that once and not repeat it thousand times? As I don't know anything about that part of the compiler nor do I have the capacity to fix that on my own there is not much more I can do here. From my perspective that still looks like a rather critical regression that was possibly introduced while the fix for the other issue was done and therefore went unnoticed for quite some time. (Or better the fallout was attribute to the other issue) |
@lcnr To me this looks like #132064, which I've bisected to #126024, and was improved by #132625 as an alternative to the possible revert in #132075. Only I guess that this looks worse with the new For this updated example,
Do you think we need a new issue to track this? (#140944 likely is another duplicate in practice, however that one can be worked around differently.) |
@lqd Thanks for looking again into this. As written above I did some profiling there last week. For me these results (especially the linked samply profile) look like it's the uninhabitedness check here as well. So this looks suspiciously like #140944, which I wasn't aware of. Maybe you can confirm that by taking a look at the linked information? I believe what happened here is that another change landed while the original regression was in place. The later regression was never noticed as it was always attributed to #132064. |
@weiznich are you at the workshops / pre all hands / all hands? It’ll be easier to talk IRL and show you the numbers, I think some of your conclusions are incomplete, and I’d also like to find out if eg you’re talking about the 20s "regression" compared to the 260s regression. But basically to me the trait solver caching regression was mitigated but turned into a different trait solver slowness (inhabitedness) for some big types; maybe caching that, turning into a query, etc would help. |
This has been caused by #128934 and #122792. Checking whether these large types are inhabited is incredibly involved. To fix this we'd likely have to change the implementation of inhabitness checking to have significantly stronger caching. Implementation such a cache will be challenging. It is really unfortunate that this regression was hidden by the opaque types regression and was not detected during the stabilization of these two features. |
move expensive layout sanity check to debug assertions It is [hard to fix](rust-lang#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in rust-lang#140944. So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`. It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from ``` +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | Item | Self time | % of total time | Time | Item count | Incremental result hashing time | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | layout_of | 63.02s | 41.895 | 244.26s | 123703 | 50.30ms | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ ``` on master (2m17s total), to ``` | layout_of | 330.21ms | 0.372 | 26.90s | 123703 | 53.19ms | ``` with this PR (1m15s total). (Note that the [perf run results](rust-lang#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit) Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
As @lqd suggested yesterday that this might be related to Otherwise I would like to thank you all for looking into this in that detail. |
move expensive layout sanity check to debug assertions It is [hard to fix](rust-lang#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in rust-lang#140944. So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`. It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from ``` +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | Item | Self time | % of total time | Time | Item count | Incremental result hashing time | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | layout_of | 63.02s | 41.895 | 244.26s | 123703 | 50.30ms | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ ``` on master (2m17s total), to ``` | layout_of | 330.21ms | 0.372 | 26.90s | 123703 | 53.19ms | ``` with this PR (1m15s total). (Note that the [perf run results](rust-lang#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit) Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
move expensive layout sanity check to debug assertions It is [hard to fix](rust-lang/rust#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in #140944. So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`. It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from ``` +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | Item | Self time | % of total time | Time | Item count | Incremental result hashing time | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | layout_of | 63.02s | 41.895 | 244.26s | 123703 | 50.30ms | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ ``` on master (2m17s total), to ``` | layout_of | 330.21ms | 0.372 | 26.90s | 123703 | 53.19ms | ``` with this PR (1m15s total). (Note that the [perf run results](rust-lang/rust#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit) Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
move expensive layout sanity check to debug assertions It is [hard to fix](rust-lang/rust#141006 (comment)) the slowness in the uninhabitedness computation for very big types but we can fix the very specific case of them being called during the layout sanity checks, as described in #140944. So this PR moves this uninhabitedness check to the other expensive layout sanity checks that are ran under `debug_assertions`. It makes building the `lemmy_api_routes` crate's self-profile `layout_of` query go from ``` +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | Item | Self time | % of total time | Time | Item count | Incremental result hashing time | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ | layout_of | 63.02s | 41.895 | 244.26s | 123703 | 50.30ms | +--------------------------------------------------------+-----------+-----------------+----------+------------+---------------------------------+ ``` on master (2m17s total), to ``` | layout_of | 330.21ms | 0.372 | 26.90s | 123703 | 53.19ms | ``` with this PR (1m15s total). (Note that the [perf run results](rust-lang/rust#141039 (comment)) below look a bit better than [an earlier run](https://perf.rust-lang.org/compare.html?start=4eca99a18eab3d4e28ed1ce3ee620d442955a470&end=c4a00993f8ee02c7565e7be652608817ea2fb97d&stat=instructions:u) I did in another PR. There may be some positive noise there, or post-merge results could differ a bit) Since we discussed this today, r? `@compiler-errors` — and cc `@lcnr` and `@RalfJung.`
#132064 (comment)
Originally posted by @Nutomic in #132064
The text was updated successfully, but these errors were encountered: