Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fallback in ILM to run cluster state steps periodically #126073

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

nielsbauman
Copy link
Contributor

@nielsbauman nielsbauman commented Apr 1, 2025

ILM sometimes skips a policy/index for a cluster state update if the step is still running/enqueued when the update comes in. That on its own isn't a problem, but in very quiet clusters, this would mean that it could take arbitrarily long for the policy step to be run - i.e. when the next cluster state comes in. We saw this happening in a few tests, but it could potentially happen in production too.

Fixes #125683
Fixes #125789
Fixes #125867
Fixes #125911
Fixes #126053
Fixes #126354

ILM sometimes skips a policy/index for a cluster state update if the
step is still running/enqueued when the update comes in. That on its own
isn't a problem, but in very quiet clusters, this would mean that
it could take arbitrarily long for the policy step to be run -
i.e. when the next cluster state comes in. We saw this happening in
a few tests, but it could potentially happen in production too.

Fixes elastic#125683
Fixes elastic#125789
Fixes elastic#125867
Fixes elastic#125911
Fixes elastic#126053
@nielsbauman nielsbauman added >enhancement :Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team auto-backport Automatically create backport pull requests when merged v8.18.1 v8.19.0 v9.1.0 labels Apr 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @nielsbauman, I've created a changelog YAML for you.

@gmarouli
Copy link
Contributor

gmarouli commented Apr 2, 2025

After @nielsbauman walked me through this PR, I see the relative simplicity of using the periodic trigger to also (re)process the current cluster state if there have been no cluster state updates since the latest periodic ILM run.

However, I am a little bit concerned with the extra complexity we are adding on an already complex system. ILM already has at least two mechanisms that ensure that we do not miss any cluster updates and that we do not execute any steps twice.

When we are triggering the policies after a cluster state update, we check at the end of the "triggering" if the cluster state has changed. See:

@Override
public void onAfter() {
// If the last seen state is unchanged, we set it to null to indicate that processing has finished and we return.
if (lastSeenState.compareAndSet(currentState.get(), null)) {
return;
}
// If the last seen cluster state changed while this thread was running, it means a new cluster state came in and we need to
// process it. We do that by kicking off a new thread, which will pick up the new cluster state when the thread gets
// executed.
processClusterState();
}

In order to avoid queueing and executing a step multiple times, we check if a task for this index and step combination has already been submitted:

private void submitUnlessAlreadyQueued(String source, IndexLifecycleClusterStateUpdateTask task) {
if (executingTasks.add(task)) {
final Tuple<Index, StepKey> dedupKey = Tuple.tuple(task.index, task.currentStepKey);
// index+step-key combination on a best-effort basis to skip checking for more work for an index on CS application
busyIndices.add(dedupKey);
task.addListener(ActionListener.running(() -> {
final boolean removed = executingTasks.remove(task);
busyIndices.remove(dedupKey);
assert removed : "tried to unregister unknown task [" + task + "]";
}));
masterServiceTaskQueue.submitTask(source, task, null);
} else {
logger.trace("skipped redundant execution of [{}]", source);
}
}

Now that we have seen this mechanisms, we can better understand the problem we are facing:

If a cluster state update arrives while a task is in the process of being executed, the deduplication will not queue the task since it's in progress and this step will miss this cluster update. This issue is mainly visible in quiet clusters and it is prominent in tests.

Before we move forward with this PR, I would like to see if we can tweak the mechanisms above to account for the issue we are trying to fix.

logger.trace("job triggered: {}, {}, {}", event.jobName(), event.scheduledTime(), event.triggeredTime());
triggerPolicies(clusterService.state(), false);
if (event.jobName().equals(XPackField.INDEX_LIFECYCLE) == false) {
assert false : "Expected scheduler event to be for ILM";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert false : "Expected scheduler event to be for ILM";
assert false : "Expected scheduler event to be for ILM but it was for " + event.jobName();

// if it was null before - to avoid redundant processing.
final var stateCurrentlyBeingProcessed = lastSeenState.compareAndExchange(null, clusterService.state());
if (stateCurrentlyBeingProcessed == null) {
logger.info("ILM didn't receive a new cluster state for [{}]. Running cluster state steps now", pollInterval);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make this debug level

processClusterState();
} else {
logger.warn(
"ILM didn't receive a new cluster state for [{}] but it was still processing cluster state version [{}]",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify this to say something like "the poll interval should be increased" in the log message? In order to give a hint to users running on-prem.

@elasticsearchmachine
Copy link
Collaborator

Hi @nielsbauman, I've updated the changelog YAML for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment