Skip to content

feat: primary resource caching for followup reconciliation(s) #2761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
3aa6d17
feat: resource cache
csviri Apr 11, 2025
0125d66
wip
csviri Apr 11, 2025
8870c14
wip
csviri Apr 14, 2025
bff907c
wip
csviri Apr 14, 2025
32823e0
wip
csviri Apr 14, 2025
b016bf0
wip
csviri Apr 14, 2025
00fd9e6
wip
csviri Apr 14, 2025
3b99f78
Integration tests
csviri Apr 15, 2025
1812851
wip
csviri Apr 15, 2025
e09472a
fix
csviri Apr 15, 2025
608fb09
Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…
csviri Apr 15, 2025
21b2ef5
fix
csviri Apr 15, 2025
870db57
additional test
csviri Apr 15, 2025
9c58fd4
doc
csviri Apr 15, 2025
e481342
Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…
csviri Apr 15, 2025
51f1ca0
Update operator-framework/src/test/java/io/javaoperatorsdk/operator/b…
csviri Apr 16, 2025
3409053
Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…
csviri Apr 16, 2025
68ca625
remove with lock versions
csviri Apr 17, 2025
84eec7b
remove not used code
csviri Apr 17, 2025
42b9ead
Revert "remove not used code"
csviri Apr 17, 2025
d51f0e3
Revert "remove with lock versions"
csviri Apr 17, 2025
14c63bb
wip
csviri Apr 17, 2025
e9bcfbe
fix: typos and start improving javadoc
metacosm Apr 17, 2025
e8ede1a
refactor
csviri Apr 22, 2025
a71eafe
docs
csviri Apr 22, 2025
eff1ccb
Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…
csviri Apr 25, 2025
217629f
Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…
csviri Apr 25, 2025
a8e7efc
Update docs/content/en/docs/documentation/reconciler.md
csviri Apr 25, 2025
a1d303d
Update operator-framework-core/src/main/java/io/javaoperatorsdk/opera…
csviri Apr 25, 2025
7b58dca
Update docs/content/en/docs/documentation/reconciler.md
csviri Apr 25, 2025
a656160
improvements
csviri Apr 25, 2025
1f1e1d0
docs
csviri Apr 25, 2025
95a9f2e
improve
csviri Apr 25, 2025
720a4c5
docs
csviri Apr 25, 2025
a510cc7
wip
csviri Apr 25, 2025
0a79dd7
wip
csviri Apr 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
120 changes: 120 additions & 0 deletions docs/content/en/docs/documentation/reconciler.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,3 +169,123 @@ You can specify the name of the finalizer to use for your `Reconciler` using the
annotation. If you do not specify a finalizer name, one will be automatically generated for you.

From v5, by default, the finalizer is added using Server Side Apply. See also `UpdateControl` in docs.

### Making sure primary is up to date for the next reconciliation

When you implement a reconciler as a final step (but maybe also multiple times during one reconciliation), you
usually update the status subresource with the information that was available during the reconciliation.
Sometimes this is referred to as the last observed state.
When the resource is updated, the framework does not cache the resource directly from the response of the update.
Instead, the underlying informer eventually receives an event with the updated resource and caches the resource.
Therefore, it can happen that on next reconciliation the primary resource is not up-to-date regarding your updated (note that other event sources

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Therefore, it can happen that on next reconciliation the primary resource is not up-to-date regarding your updated (note that other event sources
Therefore, it can happen that on next reconciliation the primary resource is not up-to-date regarding your updated status subresource (note that other event sources

can trigger the reconciliation meanwhile). This is not usually a problem, since the status is not used as an input,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the status is not used as an input,

I guess this depends on the reconciler implementation. In our case for instance, we check the status of the resource and react based up on it.

the reconciliation runs again, and the status us updated again. The caches are eventually consistent.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the reconciliation runs again, and the status us updated again. The caches are eventually consistent.
the reconciliation runs again, and the status is updated again. The caches are eventually consistent.


However, there are cases when you would like to store some state in the status, typically generated
IDs of external resources.
See related topic in Kubernetes docs: [Representing Allocated Values](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#representing-allocated-values).
In this case, it is reasonable to expect to have the state always available for the next reconciliation,
to avoid generating the resource again and other race conditions.

Therefore,
the framework provides facilities
to cover these use cases with [`PrimaryUpdateAndCacheUtils`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/PrimaryUpdateAndCacheUtils.java#L16).
These utility methods come in two flavors:

#### Using internal cache

In almost all cases for this purpose, you can use internal caches:

```java
@Override
public UpdateControl<StatusPatchCacheCustomResource> reconcile(
StatusPatchCacheCustomResource resource, Context<StatusPatchCacheCustomResource> context) {

// omitted logic

// update with SSA requires a fresh copy
var freshCopy = createFreshCopy(primary);
freshCopy.getStatus().setValue(statusWithState());

var updatedResource = PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(resource, freshCopy, context);

return UpdateControl.noUpdate();
}
```

In the background `PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus` puts the result of the update into an internal
cache of the event source of primary resource, and will make sure that the next reconciliation will contain the most
recent version of the resource. Note that it is not necessarily the version of the resource you got as response from the update ,
it can be newer since other parties can do additional updates meanwhile, but if not explicitly modified,
it will contain the up-to-date status.

See related integration test [here](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/internal).

This approach works with the default configuration of the framework and should be good to go in most of the cases.
Without going further into the details, this won't work if `ConfigurtionService.parseResourceVersionsForEventFilteringAndCaching`
is set to `false` (more precisely there are some edge cases when it won't work). For that case framework provides the following solution:

#### Fallback approach: using `PrimaryResourceCache` cache

As an alternative, for very rare cases when `ConfigurtionService.parseResourceVersionsForEventFilteringAndCaching`
needs to be set to `false` you can use an explicit caching approach:

```java

// We on purpose don't use the provided predicate to show what a custom one could look like.
private final PrimaryResourceCache<StatusPatchPrimaryCacheCustomResource> cache =
new PrimaryResourceCache<>(
(statusPatchCacheCustomResourcePair, statusPatchCacheCustomResource) ->
statusPatchCacheCustomResource.getStatus().getValue()
>= statusPatchCacheCustomResourcePair.afterUpdate().getStatus().getValue());

@Override
public UpdateControl<StatusPatchPrimaryCacheCustomResource> reconcile(
StatusPatchPrimaryCacheCustomResource primary,
Context<StatusPatchPrimaryCacheCustomResource> context) {

// cache will compare the current and the cached resource and return the more recent. (And evic the old)
primary = cache.getFreshResource(primary);

// omitted logic

var freshCopy = createFreshCopy(primary);

freshCopy.getStatus().setValue(statusWithState());

var updated =
PrimaryUpdateAndCacheUtils.ssaPatchAndCacheStatus(primary, freshCopy, context, cache);

return UpdateControl.noUpdate();
}

@Override
public DeleteControl cleanup(
StatusPatchPrimaryCacheCustomResource resource,
Context<StatusPatchPrimaryCacheCustomResource> context)
throws Exception {
// cleanup the cache on resource deletion
cache.cleanup(resource);
return DeleteControl.defaultDelete();
}

```

[`PrimaryResourceCache`](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/api/reconciler/support/PrimaryResourceCache.java)
is designed for this purpose.
As shown in the example above, it is up to you to provide a predicate to determine if the resource is more recent than the one available.
In other words, when to evict the resource from the cache. Typically, as show in the [integration test](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/primarycache)
you can have a counter in status to check on that.

Since all of this happens explicitly, you cannot use this approach for managed dependent resources and workflows (you can still use not managed);
Since passing of the primary resource to the dependent resource always comes directly from the underlying informer event
source cache.

#### Additional remarks

As shown in the integration tests, there is no optimistic locking used when updating the
[resource](https://github.com/operator-framework/java-operator-sdk/blob/main/operator-framework/src/test/java/io/javaoperatorsdk/operator/baseapi/statuscache/internal/StatusPatchCacheReconciler.java#L41)
(in other works `metadata.resourceVersion` is set to `null`).
This is desired since you don't want the patch to fail on update.

In addition, you can configure retry for in fabric8 client.
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
package io.javaoperatorsdk.operator.api.reconciler;

import java.util.function.Supplier;
import java.util.function.UnaryOperator;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import io.fabric8.kubernetes.api.model.HasMetadata;
import io.fabric8.kubernetes.client.dsl.base.PatchContext;
import io.fabric8.kubernetes.client.dsl.base.PatchType;
import io.javaoperatorsdk.operator.api.reconciler.support.PrimaryResourceCache;
import io.javaoperatorsdk.operator.processing.event.ResourceID;

/**
* Utility methods to patch the primary resource state and store it to the related cache, to make
* sure that fresh resource is present for the next reconciliation. The main use case for such
* updates is to store state is resource status. Use of optimistic locking is not desired for such
* updates, since we don't want to patch fail and lose information that we want to store.
*/
public class PrimaryUpdateAndCacheUtils {

private PrimaryUpdateAndCacheUtils() {}

private static final Logger log = LoggerFactory.getLogger(PrimaryUpdateAndCacheUtils.class);

/**
* Makes sure that the up-to-date primary resource will be present during the next reconciliation.
* Using update (PUT) method.
*
* @param primary resource
* @param context of reconciliation
* @return updated resource
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P updateAndCacheStatus(P primary, Context<P> context) {
logWarnIfResourceVersionPresent(primary);
return patchAndCacheStatus(
primary, context, () -> context.getClient().resource(primary).updateStatus());
}

/**
* Makes sure that the up-to-date primary resource will be present during the next reconciliation.
* Using JSON Merge patch.
*
* @param primary resource
* @param context of reconciliation
* @return updated resource
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P patchAndCacheStatus(P primary, Context<P> context) {
logWarnIfResourceVersionPresent(primary);
return patchAndCacheStatus(
primary, context, () -> context.getClient().resource(primary).patchStatus());
}

/**
* Makes sure that the up-to-date primary resource will be present during the next reconciliation.
* Using JSON Patch.
*
* @param primary resource
* @param context of reconciliation
* @return updated resource
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P editAndCacheStatus(
P primary, Context<P> context, UnaryOperator<P> operation) {
logWarnIfResourceVersionPresent(primary);
return patchAndCacheStatus(
primary, context, () -> context.getClient().resource(primary).editStatus(operation));
}

/**
* Makes sure that the up-to-date primary resource will be present during the next reconciliation.
*
* @param primary resource
* @param context of reconciliation
* @param patch free implementation of cache
* @return the updated resource.
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P patchAndCacheStatus(
P primary, Context<P> context, Supplier<P> patch) {
var updatedResource = patch.get();
context
.eventSourceRetriever()
.getControllerEventSource()
.handleRecentResourceUpdate(ResourceID.fromResource(primary), updatedResource, primary);
return updatedResource;
}

/**
* Makes sure that the up-to-date primary resource will be present during the next reconciliation.
* Using Server Side Apply.
*
* @param primary resource
* @param freshResourceWithStatus - fresh resource with target state
* @param context of reconciliation
* @return the updated resource.
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P ssaPatchAndCacheStatus(
P primary, P freshResourceWithStatus, Context<P> context) {
logWarnIfResourceVersionPresent(freshResourceWithStatus);
var res =
context
.getClient()
.resource(freshResourceWithStatus)
.subresource("status")
.patch(
new PatchContext.Builder()
.withForce(true)
.withFieldManager(context.getControllerConfiguration().fieldManager())
.withPatchType(PatchType.SERVER_SIDE_APPLY)
.build());

context
.eventSourceRetriever()
.getControllerEventSource()
.handleRecentResourceUpdate(ResourceID.fromResource(primary), res, primary);
return res;
}

/**
* Patches the resource and adds it to the {@link PrimaryResourceCache} provided.
*
* @param primary resource
* @param freshResourceWithStatus - fresh resource with target state
* @param context of reconciliation
* @param cache - resource cache managed by user
* @return the updated resource.
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P ssaPatchAndCacheStatus(
P primary, P freshResourceWithStatus, Context<P> context, PrimaryResourceCache<P> cache) {
logWarnIfResourceVersionPresent(freshResourceWithStatus);
return patchAndCacheStatus(
primary,
cache,
() ->
context
.getClient()
.resource(freshResourceWithStatus)
.subresource("status")
.patch(
new PatchContext.Builder()
.withForce(true)
.withFieldManager(context.getControllerConfiguration().fieldManager())
.withPatchType(PatchType.SERVER_SIDE_APPLY)
.build()));
}

/**
* Patches the resource with JSON Patch and adds it to the {@link PrimaryResourceCache} provided.
*
* @param primary resource
* @param context of reconciliation
* @param cache - resource cache managed by user
* @return the updated resource.
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P editAndCacheStatus(
P primary, Context<P> context, PrimaryResourceCache<P> cache, UnaryOperator<P> operation) {
logWarnIfResourceVersionPresent(primary);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If optimistic locking is not required, why do we need to log a warning?

return patchAndCacheStatus(
primary, cache, () -> context.getClient().resource(primary).editStatus(operation));
}

/**
* Patches the resource with JSON Merge patch and adds it to the {@link PrimaryResourceCache}
* provided.
*
* @param primary resource
* @param context of reconciliation
* @param cache - resource cache managed by user
* @return the updated resource.
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P patchAndCacheStatus(
P primary, Context<P> context, PrimaryResourceCache<P> cache) {
logWarnIfResourceVersionPresent(primary);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If optimistic locking is not required, why do we need to log a warning?

return patchAndCacheStatus(
primary, cache, () -> context.getClient().resource(primary).patchStatus());
}

/**
* Updates the resource and adds it to the {@link PrimaryResourceCache} provided.
*
* @param primary resource
* @param context of reconciliation
* @param cache - resource cache managed by user
* @return the updated resource.
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P updateAndCacheStatus(
P primary, Context<P> context, PrimaryResourceCache<P> cache) {
logWarnIfResourceVersionPresent(primary);
return patchAndCacheStatus(
primary, cache, () -> context.getClient().resource(primary).updateStatus());
}

/**
* Updates the resource using the user provided implementation anc caches the result.
*
* @param primary resource
* @param cache resource cache managed by user
* @param patch implementation of resource update*
* @return the updated resource.
* @param <P> primary resource type
*/
public static <P extends HasMetadata> P patchAndCacheStatus(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add small JavaDoc for this exposed method as well?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still would be nice to have a JavaDoc for this public method.

P primary, PrimaryResourceCache<P> cache, Supplier<P> patch) {
var updatedResource = patch.get();
cache.cacheResource(primary, updatedResource);
return updatedResource;
}

private static <P extends HasMetadata> void logWarnIfResourceVersionPresent(P primary) {
if (primary.getMetadata().getResourceVersion() != null) {
log.warn(
"The metadata.resourceVersion of primary resource is NOT null, "
+ "using optimistic locking is discouraged for this purpose. ");
}
}
}
Loading