Skip to content

[HUDI-9159] S3 implementation of StorageLock for StorageBasedLockProvider #13126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 67 commits into from
Apr 14, 2025

Conversation

alexr17
Copy link
Contributor

@alexr17 alexr17 commented Apr 10, 2025

Change Logs

Adds the storage based lock provider implementation for s3 hudi tables.

Impact

Allows conditional writes based locking for multi writer scenarios with hudi tables in s3.

Risk level (write none, low medium or high below)

None

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

In order to test this we setup local hudi/spark bundles and ran some concurrent spark sql queries to tables with this LP enabled to ensure things functioned as expected.

alexr17 and others added 19 commits April 7, 2025 20:32
…lient/transaction/lock/ConditionalWriteLockConfig.java

Co-authored-by: Y Ethan Guo <[email protected]>
…lient/transaction/lock/ConditionalWriteLockConfig.java

Co-authored-by: Y Ethan Guo <[email protected]>
…lient/transaction/lock/ConditionalWriteLockConfig.java

Co-authored-by: Y Ethan Guo <[email protected]>
…lient/transaction/lock/ConditionalWriteLockConfig.java

Co-authored-by: Y Ethan Guo <[email protected]>
…lient/transaction/lock/ConditionalWriteLockProvider.java

Co-authored-by: Y Ethan Guo <[email protected]>
…lient/transaction/lock/ConditionalWriteLockProvider.java

Co-authored-by: Y Ethan Guo <[email protected]>
…lient/transaction/lock/ConditionalWriteLockProvider.java

Co-authored-by: Y Ethan Guo <[email protected]>
@github-actions github-actions bot added the size:XL PR with lines of changes > 1000 label Apr 10, 2025
Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the S3StorageLockClient

* Tests S3-based StorageBasedLockProvider using a LocalStack container
* to emulate S3.
*/
@Disabled("HUDI-9159 The tests do not work. Disabling them to unblock Azure CI")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there plan to make this work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mock s3 behavior. Looks like you are trying to functional test using localstack

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we mock the s3 behavior with TestS3StorageLockClient. Localstack doesn't seem to work in Azure CI, same issue as other tests Davis disabled a while back

@nsivabalan
Copy link
Contributor

can you update the PR desc as to what tests we have done so far in cluster w/ this patch?
sanity, soak testing ( to weed out connection leaks, etc)?

@nsivabalan
Copy link
Contributor

btw, lets remove the retries within renew method. I see we do synchronized across renew method and close(). so, there are chances that close() starves, when renew() is sleeping.

so, lets remove the retries within renew method(or tryCreateOrUpdateLockFile) in let heart beat manager do the retry after 30 secs.

@nsivabalan
Copy link
Contributor

nsivabalan commented Apr 11, 2025

Few more feedback that needs to be addressed:

  1. https://github.com/apache/hudi/pull/13126/files#r2038561912

  2. For the missing region issue, why can't we just go ahead and instantiate S3 client and then make below call to get the region and reinstantiate s3 client if need be.

    GetBucketLocationResponse bucketLocationResponse = s3Client.getBucketLocation(GetBucketLocationRequest.builder().bucket(bucketName).build());
    String bucketRegion = bucketLocationResponse.locationConstraint().toString();

  3. https://github.com/apache/hudi/pull/13126/files#r2038581765

  4. https://github.com/apache/hudi/pull/13126/files#r2038585276

@alexr17
Copy link
Contributor Author

alexr17 commented Apr 12, 2025

@hudi-bot run azure

1 similar comment
@alexr17
Copy link
Contributor Author

alexr17 commented Apr 12, 2025

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for patiently addressing all feedback Alex.

@@ -83,13 +83,13 @@ private void checkRequiredProps() {
throw new IllegalArgumentException(BASE_PATH.key() + notExistsMsg);
}
if (lockConfig.getLongOrDefault(VALIDITY_TIMEOUT_SECONDS) < lockConfig.getLongOrDefault(HEARTBEAT_POLL_SECONDS)
* 3) {
* 10) {
Copy link
Contributor

@nsivabalan nsivabalan Apr 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you get any more feedback from danny or someone before we land this patch, can you also fix L 35 var name to LOCK_VALIDITY_TIMEOUT_SECS along w/ addressing them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually @yihua suggested VALIDITY_TIMEOUT_SECONDS in an earlier PR it used to be LOCK_VALIDITY_TIMEOUT_MS

@nsivabalan
Copy link
Contributor

@danny0405 : Do you wanna review this patch

} catch (S3Exception e) {
int status = e.statusCode();
// Default to unknown error
LockGetResult result = LockGetResult.UNKNOWN_ERROR;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename handleUpsertS3Exception to handleS3Exception and reuse it here?

// How long to wait before retrying lock acquisition in blocking calls.
private static final long DEFAULT_LOCK_ACQUISITION_BUFFER_MS = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the comment in line 76 too?

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, fine with it, just some minor comments.

@danny0405 danny0405 merged commit ed1ef93 into apache:master Apr 14, 2025
60 checks passed
voonhous pushed a commit to voonhous/hudi that referenced this pull request Apr 15, 2025
…ider (apache#13126)

* Adds the storage based lock provider implementation for s3 hudi tables.

---------

Co-authored-by: Y Ethan Guo <[email protected]>
Co-authored-by: sivabalan <[email protected]>
(cherry picked from commit ed1ef93)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-1.0.2 size:XL PR with lines of changes > 1000
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants