feat: primary resource caching for followup reconciliation(s) #2761

csviri · 2025-04-11T08:29:09Z

It provides facilities to cache the primary resource for follow-up reconciliations, thus ensuring that the reconciler always handles the up-to-date primary.

Later, we might extend this to the resource part, not just status (in terms of utils, cache already supports that) - now the utility methods focus on status.

Signed-off-by: Attila Mészáros <[email protected]>

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

...operatorsdk/operator/baseapi/statuscache/primarycache/StatusPatchPrimaryCacheReconciler.java

…tor/api/reconciler/PrimaryUpdateAndCacheUtils.java Co-authored-by: Martin Stefanko <[email protected]>

Signed-off-by: Attila Mészáros <[email protected]>

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

Signed-off-by: Attila Mészáros <[email protected]>

…tor/api/reconciler/PrimaryUpdateAndCacheUtils.java Co-authored-by: Antonio <[email protected]>

xstefank

LGTM

afalhambra-hivemq

When using the PatchAndCacheStatusWithLock approach, we will need to explicitly use optimistic locking here, which in theory it should be fine.
However, the default retry mechanism provided by the JOSDK when a 409 conflict happen doesn't pick the latest resource status, forcing us to implement a custom retry mechanism for this. Otherwise the default retry mechanism is not useful here.

Anyways, I can raise a separate issue for that default mechanism to see if it can be improved there.

Beside that, great work and thanks a lot for this PR.

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

afalhambra-hivemq · 2025-04-15T15:52:10Z

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

+        (P p, KubernetesClient c) -> c.resource(primary).updateStatus());
+  }
+
+  public static <P extends HasMetadata> P patchAndCacheStatus(


Can we add small JavaDoc for this exposed method as well?

Still would be nice to have a JavaDoc for this public method.

afalhambra-hivemq · 2025-04-16T06:50:40Z

...rc/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/PeriodicTriggerEventSource.java

+public class PeriodicTriggerEventSource<P extends HasMetadata>
+    extends AbstractEventSource<Void, P> {
+
+  public static final int DEFAULT_PERIOD = 30;


Suggested change

public static final int DEFAULT_PERIOD = 30;

private static final int DEFAULT_PERIOD = 30;

This is for tests, and for reusability would stick with the public approach

afalhambra-hivemq · 2025-04-16T07:19:51Z

.../io/javaoperatorsdk/operator/baseapi/statuscache/primarycache/StatusPatchPrimaryCacheIT.java

+        .untilAsserted(
+            () -> {
+              assertThat(reconciler.errorPresent).isFalse();
+              assertThat(reconciler.latestValue).isGreaterThan(10);


Maybe to make sure and guarantee no status update is lost, we could check an array or a list of values from 1 to 10 are present (all of them in particular order).
Otherwise, this assert isGreaterThan(10) will eventually be true regardless of the caching.

Hi there is that is error check, what should induce that there was monotonic, but yeah definitelly can improve the azsserts

.../io/javaoperatorsdk/operator/baseapi/statuscache/primarycache/StatusPatchPrimaryCacheIT.java

...ava/io/javaoperatorsdk/operator/baseapi/statuscache/withlock/StatusPatchCacheWithLockIT.java

…aseapi/statuscache/primarycache/StatusPatchPrimaryCacheIT.java Co-authored-by: Antonio <[email protected]>

…tor/api/reconciler/PrimaryUpdateAndCacheUtils.java Co-authored-by: Antonio <[email protected]>

csviri · 2025-04-16T08:57:51Z

When using the PatchAndCacheStatusWithLock approach, we will need to explicitly use optimistic locking here, which in theory it should be fine. However, the default retry mechanism provided by the JOSDK when a 409 conflict happen doesn't pick the latest resource status, forcing us to implement a custom retry mechanism for this. Otherwise the default retry mechanism is not useful here.

Anyways, I can raise a separate issue for that default mechanism to see if it can be improved there.

Beside that, great work and thanks a lot for this PR.

Probably the best would be to not use the approach with the lock, but the lockless alternative.

afalhambra-hivemq · 2025-04-16T09:47:54Z

Probably the best would be to not use the approach with the lock, but the lockless alternative.

The only drawback here with the lockless alternative is that it will require us to handle the cache in the operator itself

csviri · 2025-04-16T10:09:19Z

Probably the best would be to not use the approach with the lock, but the lockless alternative.

The only drawback here with the lockless alternative is that it will require us to handle the cache in the operator itself

Yes, but that is as simple as in the integration test. We might interate on this in the future, but in the short term I think that is best for you.

csviri · 2025-04-16T14:52:07Z

Just thinking more about this, that the locked version in this form is not very useful. Since it fails to fulfill its purpose, thus to make sure that the resource is updated with a generated id.

On the other hand we could do something that (now deprecated in client) is the resource(resource).replace(), that would basically retry locally the update, thus if fails read the resource and try again in place.

Some kind of retry would be nice also for patch withouth the lock. So will spend some more time on this to improve.

afalhambra-hivemq · 2025-04-16T15:08:04Z

resource(resource).replace(), that would basically retry locally the update

Curious here, what do you mean by retry locally?

Some kind of retry would be nice also for patch withouth the lock. So will spend some more time on this to improve.

That would be awesome! Thanks

Signed-off-by: Attila Mészáros <[email protected]>

This reverts commit 84eec7b.

This reverts commit 68ca625.

Signed-off-by: Attila Mészáros <[email protected]>

Signed-off-by: Chris Laprun <[email protected]>

Signed-off-by: Attila Mészáros <[email protected]>

csviri · 2025-04-22T12:30:28Z

Curious here, what do you mean by retry locally?

Basically that you configure fabric8 client to retry.

csviri · 2025-04-22T12:30:52Z

@afalhambra-hivemq @metacosm @xstefank added docs and made some improvements.

afalhambra-hivemq · 2025-04-24T09:02:13Z

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

+        (P p, KubernetesClient c) -> c.resource(primary).updateStatus());
+  }
+
+  public static <P extends HasMetadata> P patchAndCacheStatus(


Still would be nice to have a JavaDoc for this public method.

afalhambra-hivemq · 2025-04-24T09:02:22Z

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

+   * Updates the resource and adds it to the {@link PrimaryResourceCache} provided. Optimistic
+   * locking is not required.
+   *
+   * @param primary resource*


Suggested change

* @param primary resource*

* @param primary resource

afalhambra-hivemq · 2025-04-24T09:02:27Z

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

+   * Patches the resource with JSON Merge patch and adds it to the {@link PrimaryResourceCache}
+   * provided. Optimistic locking is not required.
+   *
+   * @param primary resource*


Suggested change

* @param primary resource*

* @param primary resource

afalhambra-hivemq · 2025-04-24T09:02:32Z

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

+   * Patches the resource with JSON Patch and adds it to the {@link PrimaryResourceCache} provided.
+   * Optimistic locking is not required.
+   *
+   * @param primary resource*


Suggested change

* @param primary resource*

* @param primary resource

afalhambra-hivemq · 2025-04-24T09:15:41Z

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java

+  private static final Logger log = LoggerFactory.getLogger(PrimaryUpdateAndCacheUtils.class);
+
+  /**
+   * Makes sure that the up-to-date primary resource will be present during the next reconciliation.


Now that the signature of the method has changed and the checkResourceVersionPresent method is removed, would be nice if we make it clear and explicit in the JavaDocs which methods require optimistic locking and which don't?

afalhambra-hivemq · 2025-04-24T10:07:50Z

docs/content/en/docs/documentation/reconciler.md

+
+Therefore,
+the framework provides facilities
+to cover these use cases withing [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java#L16).


Suggested change

to cover these use cases withing [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java#L16).

to cover these use cases with [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java#L16).

afalhambra-hivemq · 2025-04-24T10:11:32Z

docs/content/en/docs/documentation/reconciler.md

+recent version of the resource. Note that it is not necessarily the version from the update response, it can be newer
+since other parties can do additional updates meanwhile, but if not explicitly modified, it will contain the up-to-date
+status.


Note that it is not necessarily the version from the update response, it can be newer
since other parties can do additional updates meanwhile, but if not explicitly modified, it will contain the up-to-date
status.

Not so clear, what you mean here. Updated resource version?

afalhambra-hivemq · 2025-04-24T10:15:27Z

docs/content/en/docs/documentation/reconciler.md

+In other words, when to evict the resource from the cache. Typically, as show in the [integration test](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/primarycache)
+you can have a counter in status to check on that. 
+
+Since all of this happens explicitly, you cannot use it for now with managed dependent resources and workflows.


Since all of this happens explicitly, you cannot use it for now with managed dependent resources and workflows.

Can you please elaborate? This PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(primary, freshCopy, context, cache); cannot be used under what conditions?

afalhambra-hivemq · 2025-04-24T10:16:18Z

docs/content/en/docs/documentation/reconciler.md

+  }
+```
+
+In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatusWith` puts the result of the update into an internal


Suggested change

In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatusWith` puts the result of the update into an internal

In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus` puts the result of the update into an internal

afalhambra-hivemq · 2025-04-24T10:25:30Z

docs/content/en/docs/documentation/reconciler.md

+#### Additional remarks
+
+As shown in the integration tests, there is no optimistic locking used when updating the
+[resource](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/internal/StatusPatchCacheReconciler.java#L41)


In this particular IT, optimistic locking is in place, or am completely I wrong here?

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 11, 2025

csviri changed the title ~~support reconciler~~ feat: resource cache Apr 11, 2025

csviri linked an issue Apr 11, 2025 that may be closed by this pull request

Support / showcase state management in status #2746

Open

csviri changed the title ~~feat: resource cache~~ [WIP] feat: resource cache Apr 11, 2025

csviri changed the title ~~[WIP] feat: resource cache~~ [WIP] feat: primary resource caching Apr 14, 2025

csviri force-pushed the support-reconciler branch from 9019eb9 to f8b6dc6 Compare April 14, 2025 10:13

csviri linked an issue Apr 14, 2025 that may be closed by this pull request

Most updated CustomResource status must be passed in each reconciliation loop #2765

Open

csviri marked this pull request as ready for review April 15, 2025 12:11

openshift-ci bot requested review from metacosm and xstefank April 15, 2025 12:11

csviri changed the title ~~[WIP] feat: primary resource caching~~ feat: primary resource caching Apr 15, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 15, 2025

csviri changed the title ~~feat: primary resource caching~~ feat: primary resource caching for followup reconciliation Apr 15, 2025

csviri added 8 commits April 15, 2025 14:18

feat: resource cache

3aa6d17

Signed-off-by: Attila Mészáros <[email protected]>

wip

0125d66

Signed-off-by: Attila Mészáros <[email protected]>

wip

8870c14

Signed-off-by: Attila Mészáros <[email protected]>

wip

bff907c

Signed-off-by: Attila Mészáros <[email protected]>

wip

32823e0

Signed-off-by: Attila Mészáros <[email protected]>

wip

b016bf0

Signed-off-by: Attila Mészáros <[email protected]>

wip

00fd9e6

Signed-off-by: Attila Mészáros <[email protected]>

Integration tests

3b99f78

Signed-off-by: Attila Mészáros <[email protected]>

csviri force-pushed the support-reconciler branch from 19dd31c to 3b99f78 Compare April 15, 2025 12:18

csviri changed the title ~~feat: primary resource caching for followup reconciliation~~ feat: primary resource caching for followup reconciliation(s) Apr 15, 2025

csviri added 2 commits April 15, 2025 14:21

wip

1812851

Signed-off-by: Attila Mészáros <[email protected]>

fix

e09472a

Signed-off-by: Attila Mészáros <[email protected]>

xstefank reviewed Apr 15, 2025

View reviewed changes

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java Outdated Show resolved Hide resolved

...operatorsdk/operator/baseapi/statuscache/primarycache/StatusPatchPrimaryCacheReconciler.java Outdated Show resolved Hide resolved

csviri and others added 2 commits April 15, 2025 17:21

Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…

608fb09

…tor/api/reconciler/PrimaryUpdateAndCacheUtils.java Co-authored-by: Martin Stefanko <[email protected]>

fix

21b2ef5

Signed-off-by: Attila Mészáros <[email protected]>

csviri requested a review from xstefank April 15, 2025 15:23

additional test

870db57

Signed-off-by: Attila Mészáros <[email protected]>

afalhambra-hivemq reviewed Apr 15, 2025

View reviewed changes

...ore/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java Outdated Show resolved Hide resolved

csviri and others added 2 commits April 15, 2025 17:39

doc

9c58fd4

Signed-off-by: Attila Mészáros <[email protected]>

Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…

e481342

…tor/api/reconciler/PrimaryUpdateAndCacheUtils.java Co-authored-by: Antonio <[email protected]>

xstefank approved these changes Apr 16, 2025

View reviewed changes

afalhambra-hivemq reviewed Apr 16, 2025

View reviewed changes

csviri and others added 2 commits April 16, 2025 10:02

Update operator-framework/src/test/java/io/javaoperatorsdk/operator/b…

51f1ca0

…aseapi/statuscache/primarycache/StatusPatchPrimaryCacheIT.java Co-authored-by: Antonio <[email protected]>

Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…

3409053

…tor/api/reconciler/PrimaryUpdateAndCacheUtils.java Co-authored-by: Antonio <[email protected]>

csviri and others added 8 commits April 17, 2025 11:34

remove with lock versions

68ca625

Signed-off-by: Attila Mészáros <[email protected]>

remove not used code

84eec7b

Signed-off-by: Attila Mészáros <[email protected]>

Revert "remove not used code"

42b9ead

This reverts commit 84eec7b.

Revert "remove with lock versions"

d51f0e3

This reverts commit 68ca625.

wip

14c63bb

Signed-off-by: Attila Mészáros <[email protected]>

fix: typos and start improving javadoc

e9bcfbe

Signed-off-by: Chris Laprun <[email protected]>

refactor

e8ede1a

Signed-off-by: Attila Mészáros <[email protected]>

docs

a71eafe

Signed-off-by: Attila Mészáros <[email protected]>

csviri requested review from xstefank and afalhambra-hivemq April 23, 2025 10:28

afalhambra-hivemq reviewed Apr 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: primary resource caching for followup reconciliation(s) #2761

feat: primary resource caching for followup reconciliation(s) #2761

csviri commented Apr 11, 2025 •

edited

Loading

xstefank left a comment

afalhambra-hivemq left a comment

afalhambra-hivemq Apr 15, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 16, 2025

csviri Apr 16, 2025

afalhambra-hivemq Apr 16, 2025

csviri Apr 16, 2025

csviri commented Apr 16, 2025 •

edited

Loading

afalhambra-hivemq commented Apr 16, 2025

csviri commented Apr 16, 2025

csviri commented Apr 16, 2025

afalhambra-hivemq commented Apr 16, 2025

csviri commented Apr 22, 2025

csviri commented Apr 22, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

afalhambra-hivemq Apr 24, 2025

	public static final int DEFAULT_PERIOD = 30;
	private static final int DEFAULT_PERIOD = 30;

	to cover these use cases withing [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java#L16).
	to cover these use cases with [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java#L16).

	In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatusWith` puts the result of the update into an internal
	In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus` puts the result of the update into an internal

feat: primary resource caching for followup reconciliation(s) #2761

Are you sure you want to change the base?

feat: primary resource caching for followup reconciliation(s) #2761

Conversation

csviri commented Apr 11, 2025 • edited Loading

xstefank left a comment

Choose a reason for hiding this comment

afalhambra-hivemq left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csviri commented Apr 16, 2025 • edited Loading

afalhambra-hivemq commented Apr 16, 2025

csviri commented Apr 16, 2025

csviri commented Apr 16, 2025

afalhambra-hivemq commented Apr 16, 2025

csviri commented Apr 22, 2025

csviri commented Apr 22, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

csviri commented Apr 11, 2025 •

edited

Loading

csviri commented Apr 16, 2025 •

edited

Loading