Skip to content

[API Proposal]: Java Marshalling API #115506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AaronRobinsonMSFT opened this issue May 13, 2025 · 26 comments
Open

[API Proposal]: Java Marshalling API #115506

AaronRobinsonMSFT opened this issue May 13, 2025 · 26 comments
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Runtime.InteropServices blocking Marks issues that we want to fast track in order to unblock other important work
Milestone

Comments

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented May 13, 2025

Background and motivation

For interop with JVM based languages, an interaction model must be made to facilitate lifetime managements through GC coordination. This support is primary for Android for .NET, but there is no technical limitation to this support being enabled for non-Android scenarios.

See #114184 and https://github.com/BrzVlad/runtime/tree/feature-clr-gcbridge

API Proposal

namespace System.Runtime.InteropServices.Java;

[SupportedOSPlatform("android")]
public struct ComponentCrossReference
{
    public nint SourceGroupIndex;
    public nint DestinationGroupIndex;
}

[SupportedOSPlatform("android")]
public unsafe struct StronglyConnectedComponent
{
    public nint Count;
    public System.IntPtr* Contexts;
}

[SupportedOSPlatform("android")]
public unsafe struct MarkCrossReferences
{
    public nint ComponentsLen;
    public StronglyConnectedComponent* Components;
    public nint CrossReferencesLen;
    public ComponentCrossReference* CrossReferences;
}

[SupportedOSPlatform("android")]
partial static class JavaMarshal
{
    public static unsafe void Initialize(delegate* unmanaged<MarkCrossReferences*, void> markCrossReferences);

    public static GCHandle CreateCrossReferenceHandle(object obj, IntPtr context);

    public static IntPtr GetContext(GCHandle obj);

    public static unsafe void FinishCrossReferenceProcessing(
        MarkCrossReferences* crossReferences,
        Span<GCHandle> unreachableObjectHandles);
}

API Usage

// Early in application start-up, a callback is provided.
JavaMarshal.Initialize(...);

object javaProxy = ...
var userContext = (IntPtr)NativeMemory.Alloc(...);
// Store data in context.

GCHandle handle = JavaMarshal.CreateCrossReferenceHandle(javaProxy, userContext);
// Store handle in a map from JNI handle to Object for fast retrieval

// Get GCHandle associated with JNI handle
GCHandle handle = ...
var userContext = JavaMarshal.GetContext(handle);
// Extract data from context.

// Free memory passed to callback registered in JavaMarshal.Initialize().
MarkCrossReferences* markRefsMem = ...;

// Collect all GCHandles marked as "unreachable"
Span<GCHandle> unreachableObjects = ...;

JavaMarshal.FinishCrossReferenceProcessing(markRefsMem, unreachableObjects);

Case example can be found at #114184

Alternative Designs

Place under System.Runtime.InteropServices.GCBridge

Do not limit to SupportedOSPlatform("android")

Risks

Risk is missing some scenario/semantic in order to ensure Android for .NET is successful. This would be a new interop stack and be narrow in scope, following our Swift, ObjectiveC and WinRT interop solutions in .NET Core.

@AaronRobinsonMSFT AaronRobinsonMSFT added this to the 10.0.0 milestone May 13, 2025
@AaronRobinsonMSFT AaronRobinsonMSFT added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.InteropServices labels May 13, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/interop-contrib
See info in area-owners.md if you want to be subscribed.

@AaronRobinsonMSFT
Copy link
Member Author

AaronRobinsonMSFT commented May 13, 2025

@BrzVlad
Copy link
Member

BrzVlad commented May 13, 2025

In addition to the computation and passing of the SCCs and CCRs, that are handled by this API, there are a few implementation details currently present on mono.

  1. After java gc, we iterate over all weak refs and zero the refs (even long / finalization tracking ones ?) for the objects that had their java counterpart collected. The runtime gets information about which SCCs are alive from the callback into the android processing part.
  2. During java - C# marshalling, we have a mechanism to wait for bridge processing to finish. In theory, this would prevent mutations of the object graph that would lead to collecting of java objects with live C# peers.

After the original rounds of discussions around design, the conclusion was that we will have the fetching of a WeakReference target wait for bridge processing and the finishing step of the bridge processing would clear the weak refs for bridge objects with dead java peers. It seems that the only way to achieve this would to fully iterate over the weakrefs and null them, every single time a ReferenceTrackingHandle is freed. I don't think we can do this when we get back via ReleaseMarkCrossReferenceResources since we have no information about which objects were collected and which not.

Given this might be somewhat expensive, it raises the question on how relevant this nulling of weakrefs really is, considering that, even with the current design, resurrection could still happen and we can end up, in theory, with a C# object that had its java peer collected. I couldn't find a detailed reason for its existence, it seems to have been added in https://github.com/mono/mono/commit/572bd8b01721c4cf423ce1d66f96a195c54f983a but the reference to what exact kind of bug it was fixing is lost. It would surely be cleaner if the WeakReference would only have knowledge of the C# side of things. The WeakReference would then be nulled at the following collection. Calling the Target method could bring the object back alive, but is it really a problem ? I would assume the object would have its own state indicating whether the java peer is collected or not anyway, doesn't seem like it needs to rely on the weak refs for this.

So as potential changes for this api:

  • If we still do the weakref clearing, would it make sense for ReleaseMarkCrossReferenceResources to also receive the set of to be freed crossref handles so we can iterate over the weakrefs only once ?
  • If we would no longer do weakref clearing, would we need an explicit WaitForBridgeProcessing api here ? And not have the WeakReference.Target block.

@filipnavara
Copy link
Member

The WaitForBridgeProcessing barrier is necessary for reintroducing references. We must prevent any mutations on the C# side that would invalidate the compressed graph that was sent to the Java side until it is fully processed.

For a more visual explanation, please watch https://youtu.be/h57uvjMQMgU?t=534 which describes the same problem albeit with a slightly more eleborate (and less blocking) solution.

jonpryor added a commit to dotnet/java-interop that referenced this issue May 13, 2025
Context: dotnet/runtime#115506
Context: dotnet/android#10125

We have an *API*, but not (yet) usable *implementation*.

"Import" the `ManagedValueManager` from dotnet/android#10125, renaming
to `JavaBridgedValueManager`, and add the proposed bridge API
from dotnet/runtime#115506 to verify that it all compiles.

It compiles!

Next step: does it *work*?

If I squint right, the proposed API looks very very similar to the
existing MonoVM GC bridge API.  Can I implement the proposed API
in terms of MonoVM, and then have C# code perform the bridge code
instead of native code, when using MonoVM?

Let's find out!
@jonpryor
Copy link
Contributor

@AaronRobinsonMSFT: what should the markCrossReferences callback do? This should be documented.

I had thought that it would be"mirror" MonoGCBridgeCallbacks::cross_references callback, along with:

However, one field is missing in this world view: MonoGCBridgeSCC::is_alive:

mono_bool is_alive; /* to be set by the cross reference callback */

Thus, how does markCrossReferences tell the GC "this object is dead/still alive"? Do we need to add a StronglyConnectedComponent.IsAlive field?

partial struct StronglyConnectedComponent {
    public int IsAlive; // 0 if dead, non-0 if alive
}

@jonpryor
Copy link
Contributor

Naming: StronglyConnectedComponent.Context should probably be StronglyConnectedComponent.Contexts (plural), is it contains StronglyConnectedComponent.Count Context values (plural).

@AaronRobinsonMSFT
Copy link
Member Author

@BrzVlad

If we still do the weakref clearing, would it make sense for ReleaseMarkCrossReferenceResources to also receive the set of to be freed crossref handles so we can iterate over the weakrefs only once ?

That seems reasonable to me. @jkotas and I also previously talked about another callback, but replacing the "Release" with a broader "Finished" that serves both for memory clean-up and passing along the freed handles works too.

If we would no longer do weakref clearing, would we need an explicit WaitForBridgeProcessing api here ? And not have the WeakReference.Target block.

An explicit WaitForBridgeProcessing API shouldn't be needed as we can handle this as an implementation detail where the aforementioned "Finished" API must be called to track the bridge is now done.

@jonpryor

This should be documented.

Yep, it will be.

Thus, how does markCrossReferences tell the GC "this object is dead/still alive"?

The markCrossReferences callback's job is to do the actual freeing of the associated GCHandles in the JNI value map. See the proof-of-concept's map clean-up here and corresponding bridge implementation here.

@jonpryor
Copy link
Contributor

@AaronRobinsonMSFT wrote:

The markCrossReferences callback's job is to do the actual freeing of the associated GCHandles in the JNI value map. See the proof-of-concept's map clean-up here and corresponding bridge implementation here.

which doesn't actually quite make sense to me: the bridge impl has a JVMTriggerGC() invocation, i.e. it contains JNI calls, after which it can check to see if the the Java GC collected the JNI weak ref.

#114184 , meanwhile, has no Java System.gc() invocations at all.

I thus don't fully understand how it works.

@AaronRobinsonMSFT
Copy link
Member Author

AaronRobinsonMSFT commented May 13, 2025

#114184 , meanwhile, has no Java System.gc() invocations at all.

Right. Any calls to JVM APIs are the domain of the GC Bridge (2nd stage) and not needed in the dotnet/runtime repo. Only the SCC building (1st stage) is handled by runtime and that is passed to the 2nd stage which will do the JVM interaction.

@AaronRobinsonMSFT
Copy link
Member Author

@BrzVlad

If we still do the weakref clearing, would it make sense for ReleaseMarkCrossReferenceResources to also receive the set of to be freed crossref handles so we can iterate over the weakrefs only once ?

That seems reasonable to me. @jkotas and I also previously talked about another callback, but replacing the "Release" with a broader "Finished" that serves both for memory clean-up and passing along the freed handles works too.

It sounds like the workflow you're invisioning goes something like this:

  1. GC builds (1st stage) the SCCs/XRefs and collects interesting WeakReference instances.
  2. Passes the SCCs/XRefs to the GC Bridge (2nd stage), which reifies the graph in the JVM and triggers the JVM GC.
  3. The JVM GC completes and the 2nd stage now knows the dead objects.
  4. The 2nd stage passes the keys to the dead objects to they can be looked up in the JNI value map in the Android for .NET runtime.
  5. The Android for .NET runtime uses the keys to remove the GCHandles to the dead objects from some hashmap (perhaps, Dictionary<nint, GCHandle>?).
  6. The removed GCHandle instances are collected and passed, along with the MarkCrossReferences*, to some "Finished" API that can then free the GCHandle and null out the interesting WeakReference instances collected in (1).

Steps (2) through (6) would all occur under what is now considered the "bridge processing" lock.

@filipnavara
Copy link
Member

filipnavara commented May 13, 2025

An explicit WaitForBridgeProcessing API shouldn't be needed as we can handle this as an implementation detail where the aforementioned "Finished" API must be called to track the bridge is now done.

I'm not sure that's race free. We need to prevent any modification to the compressed SCC graph. That's currently ensured by calling WaitForBridgeProcessing whenever a reference is reintroduced from Java VM into the .NET VM. markCrossReferences is only called after SCC graph is constructed but is there any guarantee that while it is being constructed that no other thread can create a new reference that would modify the graph? (the prototype implementations don't call markCrossReferences synchronously in the STW stage) There's a window where the Android side of the bridge doesn't know that a GC is in progress and it could potentially result in an inconsistency.

@AaronRobinsonMSFT
Copy link
Member Author

That's currently ensured by calling WaitForBridgeProcessing whenever a reference is reintroduced from Java VM into the .NET VM.

These are independent operations in this new model. GC bridge still needs to lock around their dictionary just as they currently do. The lock within is an implementation detail that avoids us having to expose an explicit begin/end callback mechanism. We internally call markCrossReferences (begin) and we expect callers to call "Finish". They need a lock around their own collection while they are processing (2nd stage). It should naturally be broader than the internal runtime logic, but that is the domain of the interop layer and responsible for doing it correctly.

@filipnavara
Copy link
Member

filipnavara commented May 13, 2025

I'm not concerned about a race condition between markCrossReferences and finishing the cross GC operations. I am concerned about a race condition between the start of GC where some of the bridge objects referenced by their respective GCHandle appear dead (ie. not referenced by .NET objects) and the call to markCrossReferences. What happens if another thread revives one of these references before markCrossReferences callback is invoked?

Timeline:

  • Thread A starts a GC, STW happens, some information is recorded, world is resumed
  • Thread B gets a callback from Java and revives a previously dead bridge object X, creates a new reference in the .NET world that's not part of the SCC graph (Y -> X).
  • Some background thread gets called with markCrossReferences, reference Y -> X is missing from the graph, X may accidentally get collected while Y is alive

@AaronRobinsonMSFT
Copy link
Member Author

What happens if another thread revives one of these references before markCrossReferences callback is invoked?

This is triggered by the GC during a gen2. All threads are suspended.

@filipnavara
Copy link
Member

filipnavara commented May 13, 2025

This is triggered by the GC during a gen2. All threads are suspended.

So that design is intentionally different from MonoVM which did NOT call the cross references callback in stop-the-world phase, correct? Was there any thought about possible interactions with background GCs? Just trying to make sure that this is specified, not necessarily questioning the intent...

It essentially implies that the callback cannot be implemented in managed code and that Android/Java.Interop is responsible for saving the information and offloading it to background processing (unless you block the .NET GC until the Java GC finishes along with all the threads in the stop-the-world stage).

@BrzVlad
Copy link
Member

BrzVlad commented May 14, 2025

@AaronRobinsonMSFT

If we would no longer do weakref clearing, would we need an explicit WaitForBridgeProcessing api here ? And not have the WeakReference.Target block.

An explicit WaitForBridgeProcessing API shouldn't be needed as we can handle this as an implementation detail where the aforementioned "Finished" API must be called to track the bridge is now done.

Are you suggesting here that .NET Android could do its own synchronization to prevent the issue that Filip was describing, without any help from the runtime ? For example, when markCrossReferences is called during STW, they could set their own bridge in progress flag and have their own condition variable that is signaled after they call back into the runtime via ReleaseMarkCrossReferenceResources ?

  1. GC builds (1st stage) the SCCs/XRefs and collects interesting WeakReference instances.

I don't see why we would need to collect any WeakReference instances here. Seems like we can handle this directly in step 6, exactly as you mentioned. We would have the required cross ref GCHandles here with dead java peers, we would iterate over all WeakReferences and null every one of them associated with these mentioned cross ref handles. We could afterwards also free these cross ref handles. This new Finished api receiving additionally a GCHandle* array and its size sound good to me.

@jonpryor Any opinion on nulling of the WeakReference GCHandles for C# objects once the bridge processing detects their java peer is dead. Do you know if .NET Android relies on this behavior in any way ? So for example, to better visualize the workflow

Thread1                         |   Thread2 (.NET Android worker)
                                |
GC1                             |
StopWorld                       |
normal GC mark                  |
compute SCCs and xrefs          |
markCrossReferences cb          |
                                | Start doing java.GC
finish collection               |
                                | .....
ResumeWorld                     | .....
.....                           | .....
.....                           |
WeakReference.Target WaitBridge |
                                | obtain set of cross ref handles with dead java peers
                                | ReleaseMarkCrossReferenceResources
                                | (This would null WeakRefs for bridge objects with java peer collected, if we preserve mono behavior)
                                | now gc bridge processing is finished
WeakReference.Target Done       |
.....                           |
GC2                             |
...                             |

The alternative would be that WeakReference would have no insight into bridge objects and it would be nulled as part of GC2, when the object is seen as dead once again, this time without any java peer / crossref gchandle around to hold it alive. Also note that for this API there is no longer the requirement for Bridge objects to have finalizers, as it was the case on mono.

@filipnavara
Copy link
Member

filipnavara commented May 14, 2025

Any opinion on nulling of the WeakReference GCHandles for C# objects once the bridge processing detects their java peer is dead. Do you know if .NET Android relies on this behavior in any way ?

There's a problem in the example you've shown. Resolving the WeakReference can be done from user code. If it doesn't do WaitForBridge and getting nulled then user code can get a reference to Java object while Thread2 is performing the Java collection. It can assign the WeakReference.Target to some variable, once again creating an addition X -> Y edge that is not captured in the cross-references graph, and you will end up with object Y in "disposed" state for C# code that's perfectly valid.

Pseudo-code:

WeakReference<JavaHttpClient> cachedHttpClient;
... // GC1 start here
JavaHttpClient? localHttpClient = cachedHttpClient.Target;
if (localHttpClient is null) {
    // Create new HTTP client and assign it to localHttpClient and cachedHttpClient
}
// Thread2 may have finished Java GC and disposed the JavaHttpClient instance
await localHttpClient.GetAsync(...);

@BrzVlad
Copy link
Member

BrzVlad commented May 14, 2025

@filipnavara Overall the summary of my point is that, due to resurrection, in theory it is possible to have C# objects with dead java peers, even now on mono. For example

class ObjWithFinalizer {
     Bridge bridge;

      ~ObjWithFinalizer()
      {
             // Use bridge
       }
}

ObjWithFinalizer obj = ...;
// obj dies

GC();
// bridge and obj are seen as dead by the GC.
// obj gets enqueued to have the finalizer run
// java peer of bridge object could end up collected

// Finalizer gets called at some point and bridge object is used with disposed java counterpart

So waiting and nulling WeakReferences doesn't 100% prevent this issue, seems more like a best effort. I'm not saying we should give up on it, rather just making sure we have no other options and we want to keep doing this, before implementing it.

@filipnavara
Copy link
Member

Overall the summary of my point is that, due to resurrection, in theory it is possible to have C# objects with dead java peers, even now on mono.

Definitely, but using finalizer and resurrecting objects is rather odd edge case where I expect all the hell to break loose.

Using WeakReferences pointing to Java objects is extremely common in UI frameworks and applications and MonoVM supports this pattern.

@AaronRobinsonMSFT
Copy link
Member Author

AaronRobinsonMSFT commented May 14, 2025

@BrzVlad

For example, when markCrossReferences is called during STW, they could set their own bridge in progress flag and have their own condition variable that is signaled after they call back into the runtime via ReleaseMarkCrossReferenceResources ?

Yes. They will need to do that anyways and already do. The runtime isn't going to be able to lock their dictionary that maps from JNI handle to GCHandle (prototype) - currently they do this. They will need their own indication that that map is "locked" while the 2nd stage is running. That handles the Android runtime side of things as it relates to resurrection. The runtime part of resurrection via WeakReference/WeakReference<T> needs the narrower "internal 2nd stage running" lock to indicate that certain weak references are impacted by the 2nd stage running. This of course is an optimization and the GC could just consider all weak references suspect for now, but a possible future optimization is determining which weak references are potentially interesting and only blocking on those if the 2nd stage is running.

So for example, to better visualize the workflow

This is exactly the workflow being proposed. We're on the same page.

@filipnavara
Copy link
Member

It looks to me like the current proposed API shape doesn't cover enough surface to make WeakReference do the WaitForBridgeProcessing call. Previous discussion implies that the JavaMarshal consumer is responsible for locking between markCrossReferences and the end of processing (up to the point beyond ReleaseMarkCrossReferenceResources call). The runtime would need to either duplicate the work and depend on ReleaseMarkCrossReferenceResources to release its own "processing lock" or there needs to be a more explicit mechanism.

I'm still generally not convinced whether it should be the consumer of JavaMarshal doing this synchronization and offloading of the processing to a background thread or whether the runtime should do it. If the runtime were to do it then we would need to add WaitForBridgeProcessing to the API surface. The obvious benefit would be that the background thread can run JIT code so all of the processing can be done in managed code.

@AaronRobinsonMSFT
Copy link
Member Author

AaronRobinsonMSFT commented May 18, 2025

It looks to me like the current proposed API shape doesn't cover enough surface to make WeakReference do the WaitForBridgeProcessing call.

Correct. The current "Release" needs to be updated to be a "Finish" and pass along the GCHandles to free and null out any related weak references.

I'm still generally not convinced whether it should be the consumer of JavaMarshal doing this synchronization and offloading of the processing to a background thread or whether the runtime should do it.

It is going to be simpler for the bridge to handle the thread management or else this API becomes more complicated in an area we are explicitly not trying to innovate on. As previously mentioned if it occurs we need to innovate it won't just be the data structures but also the general API and semantics that will move as well.

The obvious benefit would be that the background thread can run JIT code so all of the processing can be done in managed code.

The background thread can be written in managed with this approach as well and it is something we could experiement with, but my guess is it will be less performant to do it in managed code. I say that because, looking at @BrzVlad's flow diagram, it would mean that while the 2nd stage was running subsequent GCs would impact it and delay getting the 2nd stage complete. The logic in the 2nd stage is relatively straight forward, but does interact with the JVM a fair bit. Pushing that logic into managed would also increase the number of interop calls substantially for very little benefit (JVM strong <-> weak refs).

@filipnavara
Copy link
Member

it would mean that while the 2nd stage was running subsequent GCs would image it and delay getting the 2nd stage complete.

I guess that's fair point for not doing it in managed code. My argument was never about allowing a part of it in managed code, it was about allowing all of it in managed code which was tried (dotnet/java-interop#1334) and failed for obvious reasons.

@AaronRobinsonMSFT
Copy link
Member Author

@BrzVlad Do you think with the updated API shape we can take this to API review? I'd like to get this reviewed sooner rather than later.

@BrzVlad
Copy link
Member

BrzVlad commented May 20, 2025

In my opinion the API looks good. To better visualize the usage/implementation, I have the prototype at https://github.com/BrzVlad/runtime/commits/feature-clr-gcbridge/ which appears to be working correctly, passing the local test that I added. The branch is a WIP, I haven't yet cleaned it up. The last two commits contain the waiting for bridge processing when checking the target of a WeakReference and the clearing of the weakrefs in the bridge processing finish stage. It would be nice to take a quick look over these 2 commits to see whether it looks good from design perspective so I can start preparing my branch for integration into coreclr and have confirmation that this is the API we are going with. @AaronRobinsonMSFT @jkotas @Maoni0

@AaronRobinsonMSFT
Copy link
Member Author

@BrzVlad The overall flow/architecture look good. There are two issues that need to be addressed.

  1. There is a GC hole in the call to FinishCrossReferenceProcessing. We're passing a void* to the byref contained within a Span<T>. That should really be an ObjectOnStack instance and extracted in a cooperative block.

  2. The GCHandleInternalGetBridgeWait will need a fast/slow split instead of a single FCall that switches to preemptive mode. The FCall should return immediately with "true" if it is free, otherwise it should return "false" and then call a QCall that does the actual work in preemptive mode.

Once the branch is in a PR, I can make a few changes on the VM side. Otherwise, this looks like what we've talked about.

grendello pushed a commit to dotnet/java-interop that referenced this issue May 23, 2025
Context: dotnet/runtime#115506
Context: dotnet/android#10125

We have an *API*, but not (yet) usable *implementation*.

"Import" the `ManagedValueManager` from dotnet/android#10125, renaming
to `JavaBridgedValueManager`, and add the proposed bridge API
from dotnet/runtime#115506 to verify that it all compiles.

It compiles!

Next step: does it *work*?

If I squint right, the proposed API looks very very similar to the
existing MonoVM GC bridge API.  Can I implement the proposed API
in terms of MonoVM, and then have C# code perform the bridge code
instead of native code, when using MonoVM?

Let's find out!
@AaronRobinsonMSFT AaronRobinsonMSFT added blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels May 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Runtime.InteropServices blocking Marks issues that we want to fast track in order to unblock other important work
Projects
Status: No status
Development

No branches or pull requests

4 participants