Skip to content

[FLINK-37637] Avoid dead lock for Configuration's addAll #26426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Apr 9, 2025

What is the purpose of the change

This PR aims to avoid dead lock for Configuration's addAll.
When multiple threads call the addAll method simultaneously and the passed Configuration objects are each other's parameters, a deadlock may occur:

// Thread 1
configA.addAll(configB);  // First lock configA.confData, then try locking configB.confData

// Thread 2
configB.addAll(configA);  // First lock configB.confData, then try locking configA.confData

Brief change log

Avoid dead lock for Configuration's addAll.

Verifying this change

This change is already covered by existing tests, such as (JobManagerProcessUtilsTest).
There are many test cases using the addAll.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented Apr 9, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build


// Ensure consistent lock sequence
if (System.identityHashCode(lock1) < System.identityHashCode(lock2)) {
synchronized (lock1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the locking sequence is the other way round in this instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is always keep the same order of locking objects, so could avoid dead lock.

bld.setLength(pl);
bld.append(entry.getKey());
this.confData.put(bld.toString(), entry.getValue());
Object lock1 = this.confData;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest lock1 -> thisConfDataLock
lock2 -> otherConfDataLock

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about thisLock and otherLock ?

@@ -239,9 +239,21 @@ public void addAllToProperties(Properties props) {
}

public void addAll(Configuration other) {
synchronized (this.confData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not totally convinced there are no issues with this locking strategy.

If we have 2 configuration object A and B, they would always lock the other then ourselves. From object A's and B's perspective the lock order is not always the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes we always lock one first and then another. As you said, one thread locks the other object, then all the threads will lock the other object first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I see that is how you have coded it, but objects should always be locked in the same order. So Object A then Object B should be locked in that order, irrespective of which thread owns them.

If each thread does its own object first, then thread 1 could lock Object A , then thread 2 lock Object B, then thread 1 would be locked out of Object B, and vice versa.

I was thinking if the objects had unique hashes then always do the lower hash first to ensure the order was the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Let me take an example. There are two objects A and B. The hash value of Object A is smaller than Object B's.
Thread 1 holds the Object A and thread 2 holds the Object B. Thread 1 will lock Object A first and then Object B. Thread 2 wants lock Object B first but it can't since the hash value of Object A is lower than Object B, so this thread still needs lock Object A first and then Object B.

@beliefer
Copy link
Contributor Author

beliefer commented Apr 9, 2025

@flinkbot run azure

@beliefer
Copy link
Contributor Author

@beliefer
Copy link
Contributor Author

@davidradl Could you take a look at this PR again ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants