Skip to content

Synonyms API - Ensure system index is available when updating synonyms #121441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
carlosdelest opened this issue Jan 31, 2025 · 8 comments · Fixed by #126935
Closed

Synonyms API - Ensure system index is available when updating synonyms #121441

carlosdelest opened this issue Jan 31, 2025 · 8 comments · Fixed by #126935
Labels
>enhancement :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@carlosdelest
Copy link
Member

Description

When the synonyms system index is first created via the first update via the Synonyms API, there's the possibility that the operation returns before the index is ready for search.

For example, in serverless it's possible that read shards have not been replicated yet from the index shards. That can cause an error if the Synonyms API is invoked afterwards, as there would be no read shards available and thus the request would fail.

It would be good to check that the synonyms index is readable before returning to the user when an update operation is requested. That way, ES can do some timed wait on the index being available when it receives back a 503.

We could put this as a separate option similar to wait_for_completion, although that could be misleading - the Synonyms API already waits for the analyzers to be reloaded. As this is an API that is not expecting heavy updates, it could be enough with waiting for a default (or configurable) timeout when an update happens to ensure the synonyms system index is searchable.

Related - #121180

@carlosdelest carlosdelest added :Search Relevance/Analysis How text is split into tokens >enhancement Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Jan 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@carlosdelest
Copy link
Member Author

Draft PR: #126314

@benwtrent
Copy link
Member

@carlosdelest this does feel like two parameters. I don't think timeout is good here. timeout implies the action will be cancelled if the timeout is exceeded. This is just not true. We just return to the user before its actually finished.

I think we need two parameters:

  • wait_for_completion (as you alluded to), is a key thing here. Either we are synchronously waiting for the task to complete, or asynchronously. Defaulting to false
  • refresh, which indicates "Hey, this API will not return until the synonyms are usable", just like it is handled in the index API.

I think its valid for:

wait_for_completion=true&refresh=false, which is "synchronous call, but don't wait for it to be usable".

Also for wait_for_completion=false&refresh=true here we return a taskID that and they can poll the tasks API to see when its complete (if they don't want to hold the HTTP request open).

Also for wait_for_completion=true&refresh=true, synchronous wait until its all done.

What do you think?

@mayya-sharipova
Copy link
Contributor

I have a couple of points:

  • In ensuring availability of .synonyms index only relevant for the first put synonyms API and on subsequent calls of the put API we don't need this check? (If yes, may be SynonymsManagementAPIService can have a SetOnce flag that sets when index is available)
  • How long it takes to wait for checking index availability? if it is short period, may be we get away without introducing any new params and just always make this check (or only on first update)

@carlosdelest
Copy link
Member Author

@benwtrent @mayya-sharipova thanks for your comments!

I agree that a timeout can be confusing from a user perspective - I was exposing the cluster health API timeout directly and that might not be the best way forward with this.

In my mind, we want that check to happen by default, and users may want to skip it in case there's some availability problem with the synonyms index. That should be a temporary issue, as a new replica should be created (in serverless or stateful).

wait_for_completion doesn't feel right to me as this is not about async processing. We either do this check or not. Not doing the check means synonyms can be inconsistent in the different nodes (some analyzers could have loaded the new synonyms, others not).

refresh looks good! I'd say we can include two things for refresh:

  • Checking the synonyms index availability
  • Reload the analyzers (as we do that by default now)

That way, when refresh=false, it's the user responsibility to do the analyzer reload. Reloading analyzers response will surface any problem with reloading, including synonyms index not being available.

That looks aligned with other users of refresh and also allows users to do their own analyzer reloading logic separate from the Synonyms API, something that was not possible before.

WDYT?

@carlosdelest
Copy link
Member Author

@mayya-sharipova

In ensuring availability of .synonyms index only relevant for the first put synonyms API and on subsequent calls of the put API we don't need this check? (If yes, may be SynonymsManagementAPIService can have a SetOnce flag that sets when index is available)

That's something we can do, although I wonder if there are some other possibilities for the index to become yellow that we would miss that way.

How long it takes to wait for checking index availability? if it is short period, may be we get away without introducing any new params and just always make this check (or only on first update)

It's quite short, but we need to read from the master for this one. We always want to make this check if possible for the reasons mentioned above, and may want to skip that if there is some kind of synonyms index availability problem or if users want to perform reloading analyzers themselves.

@mayya-sharipova
Copy link
Contributor

mayya-sharipova commented Apr 9, 2025

@carlosdelest I thought about it more, and I want to clarify something with you. From your experience, would not wait_for_active_shards=all be a better option? Would this approach be better:

  1. On first synonyms PUT operation, ensure index status is green
  2. On subsequent synonyms PUT operations, do index requests with wait_for_active_shards=all? This ensures that index operation is successful across all shards, and reload analyzers will get new data whatever replicas it hits. It doesn't need any manual intervention from users, no new params.

@carlosdelest
Copy link
Member Author

carlosdelest commented Apr 9, 2025

We discussed this offline, and agreed on:

  • New parameter called reload_analyzers for the synonym API. Defaults to true (the current behavior)
    • This new flag will also await for cluster health for the synonyms index to be searchable (e.g. green).
  • When reload_analyzers is false, the reload analyzers step in the synonym APIs will be bypassed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Analysis How text is split into tokens Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
4 participants