Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure stores cannot be replicated in CCR #126356

Open
jbaiera opened this issue Apr 5, 2025 · 3 comments
Open

Failure stores cannot be replicated in CCR #126356

jbaiera opened this issue Apr 5, 2025 · 3 comments
Labels
:Data Management/Data streams Data streams and their lifecycles :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement Team:Data Management Meta label for data/management team Team:Distributed Indexing Meta label for Distributed Indexing team

Comments

@jbaiera
Copy link
Member

jbaiera commented Apr 5, 2025

We currently filter failure store indices out from being considered for auto follow logic when processing data streams (see #126355). This is because a write index must always be present on a data stream. If a failure index is the only index from a cluster that is replicated (i.e. no other backing indices are replicated), the resulting follower data stream is not guaranteed to satisfy this invariant. Even if all indices are replicated, the construction of the data stream to hold them is done as part of the follow operation. If follow operations are done in arbitrary order and are subject to independent failure, we can't guarantee that a data stream will always have a write index in its backing index set.

@jbaiera jbaiera added :Data Management/Data streams Data streams and their lifecycles :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement labels Apr 5, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Apr 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Apr 5, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@jbaiera
Copy link
Member Author

jbaiera commented Apr 5, 2025

Some suggestions on path forward:

When replicating a failure store index, if it is the first follow operation to create the data stream, then it could create an empty write index to satisfy the data stream invariant. This has the downside of potentially producing a follower data stream that is not structurally identical to the leader data stream. The upside is that this would be the simplest solution.

Alternatively, we could refactor data streams to ease this invariant of always requiring a write index. Data streams now have a feature that allows them to be lazily rolled over when they receive their first document write operation. We use this with failure stores to ensure that they are always ready to accept a write, while not actually allocating any indices up front. If we added this ability to the regular backing indices on a data stream (that a data stream could have no indices as long as it is marked for lazy rollover) then we can work on relaxing the invariant that a data stream always has a write index. Once eased, we no longer need to worry about replication order outside of the eventual ordering of the backing indices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features >enhancement Team:Data Management Meta label for data/management team Team:Distributed Indexing Meta label for Distributed Indexing team
Projects
None yet
Development

No branches or pull requests

2 participants