-
Notifications
You must be signed in to change notification settings - Fork 5k
[release/9.0] Fix edge cases in Tarjan GC bridge (Android) #114682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/9.0-staging
Are you sure you want to change the base?
Conversation
…otnet#112970) Fix typo in GC bridge comparison message (SCCS -> XREFS)
* [mono][sgen] Fix DUMP_GRAPH debug option build for tarjan bridge * [mono][sgen] Don't create ScanData* during debug dumping of SCCs It serves no purpose and it would later crash the runtime since we didn't patch the lockword back in place. * [mono][sgen] Fix some null deref crashes in DUMP_GRAPH debug option * [mono][tests] Add bridge tests These are ported from some of the bridge tests we had on mono/mono. In order to test them we compare between the output of the new and the tarjan bridge.
…formation (dotnet#112825) * Fix an edge case in the Tarjan SCC that lead to losing xref information In the Tarjan SCC bridge processing there's a color graph used to find out connections between SCCs. There was a rare case which only manifested when a cycle in the object graph points to another cycle that points to a bridge object. We only recognized direct bridge pointers but not pointers to other non-bridge SCCs that in turn point to bridges and where we already calculated the xrefs. These xrefs were then lost. * Add test case to sgen-bridge-pathologies and add an assert to catch the original bug * Add review --------- Co-authored-by: Vlad Brezae <brezaevlad@gmail.com>
…duplication (dotnet#113044) * [SGen/Tarjan] Handle edge case with node heaviness changing due to deduplication Do early deduplication Fix Windows build Add test cases to sgen-bridge-pathologies * Move test code * Remove old code * Add extra check (no change to functionality)
* [mono][sgen] Fix initialization of can_reduce_color Mark it as true also in the case when the SCC contains bridge objects. Code populating other_colors for this SCC depends on it. * Disable test on wasm and mobile The test is meant to be run only on desktop.
Tagging subscribers to this area: @BrzVlad |
@BrzVlad could you please check that this PR contains all the fixes we need for this? We had to revert it last time around, so I want to make sure we have all the changes we know of right now. The plan is to take this for the next servicing release. |
Note there is also a recent fix for the |
I was undecided about including #115049 and I am happy to go either way, or postpone it into some future servicing update. It's fixing the "new" bridge which is not the default and improves the testing infrastructure. While it is not the default, it was a suggested workaround for some of the issues so there may be people running it. Notably, Uno.NET switched all the app templates to use "new" bridge some time ago. |
This might be interesting to revisit with them. My understanding is that in .NET 10 (and with this change maybe even in .NET 9) we've fixed everything we know about for the tarjan bridge. So hopefully Uno.NET would work as well. Regardless, I'm in favor of treating the fix to the new bridge as a separate port. I'm not against fixing it, but I'd rather keep the servicing fixes focused (this one is already quite complex as is). |
Backport of #112825, #112970, #113044, #113703 and #114637 to release/9.0-staging
/cc @BrzVlad @vitek-karas
Customer Impact
Two customer issues on Android were traced back to bugs in the MonoVM Tarjan GC bridge responsible for bridging .NET and Java garbage collectors.
In the first issue (dotnet/android#9039) the runtime failed to report edges to the Android GC bridge code, resulting in alive objects collected in the Java runtime and their .NET peers throwing
ObjectDisposedException
. The particular pattern could have been triggered by code usingawait
on the UI thread where the continuation was scheduled to run through synchronization context on the same thread.In the second issue (#106410) the runtime crashed on an assertion for object graphs of certain shape. The cause of the crash was an incorrect assumption about duplicate edges in a compressed object graph and how later stages of the algorithm process them. When the later stages of the processing deduplicated the edges it may have resulted in an unforseen change to some of the graph properties leading to a logic error in the algorithm.
Regression
Both of these bugs existed in .NET MonoVM / Android code since the inception. Legacy Xamarin.Android offered 3 different bridge algorithms (
old
,new
, andtarjan
) with thenew
bridge being the default for a while. The default was changed totarjan
before MonoVM was merged into .NET.Testing
Test cases reproducing both issues were presented and verified against the fix. Additionally, a new testing code was introduced to specifically test the problematic patterns as part of .NET runtime testing infrastructure.
Risk
Medium - the changes simply make reporting the graph more accurate. Even though we have high confidence in the changes, we recognize it does touch a sensitive area.