-
Notifications
You must be signed in to change notification settings - Fork 1.1k
UnsupportedOperation when merging Lucene90BlockTreeTermsWriter
#14429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Phew, this is a spooky exception! I think it means that the same term was fed to the FST Builder twice in row. FST Builder in general can support this case, and it means that a single output can have multiple outputs, and the BlockTree is confusing in how it builds up its blocks. It does it one sub-tree at a time, using intermediate FSTs to hold each sub-tree, and then regurgitating the terms from each subtree with So .... somehow this regurgitation process added the same term twice in a row. This means either a given Do we know any fun details about the use case? Maybe an exotic/old JVM? Massive numbers of terms...? Or the terms are some crazy binary gene sequences or something? |
Thank you @mikemccand for some details!
I will see what I can find. |
@mikemccand OK, I gathered more info:
So other system stuff doesn't seem very exotic. However, the data being ingested might have various pieces of turkish unicode. Digging around the analyzers, I didn't find any special handling, so its all using the StandardAnalyzer with no additional normalization. I wonder if we are just hitting the dreaded turkish "i" unicode issue |
Description
Found this in the wild. I haven't been able to replicate :(
I don't even know what it means to hit this
fst.outputs.merge
branch and under what conditions is it valid/invalid. Any pointers here would be useful.We ran into a strange postings merge error in production.
The FST compiler reaches the "merge" line when merging some segments:
lucene/lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java
Lines 933 to 936 in 4b94d97
However, the "outputs" provided by
Lucene90BlockTreeTermsWriter
isByteSequenceOutputs
, which does not override merge, and thus throws an unsupported operation exception.lucene/lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/Lucene90BlockTreeTermsWriter.java
Lines 534 to 548 in 4b94d97
Given this, it seems like it should be "impossible" to reach the "Outputs.merge" path when merging with the
Lucene90BlockTreeTermsWriter
, but somehow it did.Any ideas on where I should look?
The text was updated successfully, but these errors were encountered: