-
Notifications
You must be signed in to change notification settings - Fork 752
Usage of AsyncGetCallTrace leads to livelock in 'getSendSlotsFromSignature' #20577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Issue Number: 20577 |
openj9/runtime/util/sendslot.c Lines 27 to 60 in 27971d2
The code assumes a valid signature as input. It's not clear to me how this could loop forever - a crash I could understand. |
Could this be caused by passing garbage to Unfortunately, this is happening consistently - this function is the last on stack and, as I mentioned, when I attach via |
@a7ehuo, may I ask you to take a look at at this problem? |
I can reproduce the hang problem with JDK11.
|
The caller at line 1173 is the normal case, not OSR. |
Can you get a java stack dump at the point of failure? |
Do you mean
|
I added
The assert happens because both [1]
[2]
[3]
|
I doubt asserts work during ASGCT (trace in general is disabled), and it most definitely can not trigger decompilation. |
That might explain why I didn't see the application stop to dump the core even when it detected the signature's format is incorrect in Assert_Util_true(signature[0] == '(');
if (signature[0] != '(') {
*(volatile int*)(0) = 1;
raise(SIGABRT);
raise(SIGTRAP);
} I also run the test with |
The entirety of the ASGCT call is signal protected (since we're so often attempting to walk in-flight stacks which results in a crash). Your best bet might be to print to the console and capture that output in the test run. |
I added debug code in [1]
[2]
[3]
|
OIC, target 0.51 |
@tajila, per the investigation from @a7ehuo, could someone on the VM team take a look at this? |
Since there's no way |
The loop breaker is in the outer loop of the walker in |
I dom't see any obvious loop for resolve frames - a single one is processed then we move on to the JIT frame loop. |
@gacholio Any update on this ? |
As far as I can see, there is no looping condition in the JIT stack walker that doesn't call |
I've been running the JDK8 test provided to me, but I have yet to see a hang. |
I believe I have reproduced the hang:
|
The stack is not walkable:
|
Send slots has been going for a while:
|
Subsequent core shows progress:
|
A much later core looks like the counters have wrapped:
|
|
Prevent potential infinite loop in getSemdSlotsFromSignature. Fixes: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Prevent potential infinite loop in getSemdSlotsFromSignature. Fixes: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Prevent potential infinite loop in getSemdSlotsFromSignature. Fixes: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Prevent potential infinite loop in getSemdSlotsFromSignature. Fixes: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Prevent potential infinite loop in getSendSlotsFromSignature. Related: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Prevent potential infinite loop in getSendSlotsFromSignature. Related: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Prevent potential infinite loop in getSendSlotsFromSignature. Related: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
@jbachorik Please try a build with the supposed fix. Should be in today's nightlies. |
I've added this to the 0.51 milestone as a reminder to see about adding the "supposed fix" to the 0.51 release branch. |
Hello, I can confirm that I can not observe the deadlock any more. |
Let's defiitely incliude this in 0.51 |
Prevent potential infinite loop in getSendSlotsFromSignature. Related: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Prevent potential infinite loop in getSendSlotsFromSignature. Related: eclipse-openj9#20577 Signed-off-by: Graham Chapman <[email protected]>
Created the backport PRs. |
Problem description
When using profiler that calls AsyncGetCallTrace in a signal handler, the profiled application locks up sooner or later.
Inspection of the locked up application in gdb reveals the following stacktrace and an attempt to resume the execution is unsuccessful as the thread is blocked in
getSendSlotsFromSignature
:This particular stacktrace is from Java 11.0.24 but the same behaviour is observed on 8.0.422, 17.0.12 and 21.0.4
Steps to reproduce
java -javaagent:<pathto>/dd-java-agent-1.42.1.jar -Ddd.profiling.enabled=true -Ddd.profiling.ddprof.enabled=true -Ddd.profiling.upload.period=10 -Ddd.profiling.start-force-first=true -jar <path-to>/renaissance-mit-0.15.0.jar akka-uct -r 50000
- mind you, the lock up is usually not happening immediately, but within 15-20 minutes it is guaranteed, according to my experiments.Note: The current version of dd-trace-java agent will not start on JDK 21.0.4 - it is a known issue and I tested 21.0.4 with a patched version of the agent.
The text was updated successfully, but these errors were encountered: