Failure stores cannot be replicated in CCR #126356

jbaiera · 2025-04-05T03:43:59Z

We currently filter failure store indices out from being considered for auto follow logic when processing data streams (see #126355). This is because a write index must always be present on a data stream. If a failure index is the only index from a cluster that is replicated (i.e. no other backing indices are replicated), the resulting follower data stream is not guaranteed to satisfy this invariant. Even if all indices are replicated, the construction of the data stream to hold them is done as part of the follow operation. If follow operations are done in arbitrary order and are subject to independent failure, we can't guarantee that a data stream will always have a write index in its backing index set.

elasticsearchmachine · 2025-04-05T03:44:22Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

elasticsearchmachine · 2025-04-05T03:44:23Z

Pinging @elastic/es-data-management (Team:Data Management)

jbaiera · 2025-04-05T03:58:40Z

Some suggestions on path forward:

When replicating a failure store index, if it is the first follow operation to create the data stream, then it could create an empty write index to satisfy the data stream invariant. This has the downside of potentially producing a follower data stream that is not structurally identical to the leader data stream. The upside is that this would be the simplest solution.

Alternatively, we could refactor data streams to ease this invariant of always requiring a write index. Data streams now have a feature that allows them to be lazily rolled over when they receive their first document write operation. We use this with failure stores to ensure that they are always ready to accept a write, while not actually allocating any indices up front. If we added this ability to the regular backing indices on a data stream (that a data stream could have no indices as long as it is marked for lazy rollover) then we can work on relaxing the invariant that a data stream always has a write index. Once eased, we no longer need to worry about replication order outside of the eventual ordering of the backing indices.

jbaiera added :Data Management/Data streams Data streams and their lifecycles :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement labels Apr 5, 2025

elasticsearchmachine added the Team:Data Management Meta label for data/management team label Apr 5, 2025

elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Apr 5, 2025

jbaiera mentioned this issue Apr 5, 2025

Restrict failure stores from replicating via CCR #126355

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure stores cannot be replicated in CCR #126356

Failure stores cannot be replicated in CCR #126356

jbaiera commented Apr 5, 2025

elasticsearchmachine commented Apr 5, 2025

elasticsearchmachine commented Apr 5, 2025

jbaiera commented Apr 5, 2025

Failure stores cannot be replicated in CCR #126356

Failure stores cannot be replicated in CCR #126356

Comments

jbaiera commented Apr 5, 2025

elasticsearchmachine commented Apr 5, 2025

elasticsearchmachine commented Apr 5, 2025

jbaiera commented Apr 5, 2025