Skip to content

Equal PersistentOrderedSets are not equal #204 #217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: master
Choose a base branch
from

Conversation

DmitryNekrasov
Copy link

@DmitryNekrasov DmitryNekrasov commented Apr 21, 2025

Tree invariant issue

TrieNode-123

To understand why the equals() method did not work correctly, consider the following PersistentHashMap creation:

val map1 = persistentHashMapOf(-1 to "a", 0 to "b", 32 to "c")

map1 has the internal structure shown in pic. 1. Elements with keys 0 and 32 have moved to the second level, since they have a collision in the lower 5 bits (0 and 32 have the lower 5 bits equal to 0b00000).

Next we remove the element with the key 0, but we do it differently, using or not using the builder for mutable change:

val map3 = map1.remove(0)

val builder = map1.builder()
builder.remove(0)
val map4 = builder.build()

In the first, immutable case, after removing the key 0, the node at the lower level contains only one key-value pair, so we promote it to the level above, and get the result shown in pic. 2.
In the mutable case, there is no check whether the inserted node contains only one value (with some caveats, this is true for our example), we do not promote the node and get the structure shown in pic. 3.

Comparison of objects of the PersistentHashMap class includes comparison of the dataMap and nodeMap fields of the root nodes. Since map3 and map4, despite having the same content, have different internal structure due to lack of promotion, their comparison fails.

The fix is ​​to add promotion for the mutable case. After I added promotion to the mutableUpdateNodeAtIndex method, I noticed that this method became very similar to its immutable variant updateNodeAtIndex, except that the mutable variant takes owner: MutabilityOwnership as a parameter, and if the owners of the current and inserted nodes match, then a mutable insertion occurs into the buffer of the current node:

if (ownedBy === owner) {
    buffer[nodeIndex] = newNode
    return this
}

I merged the 2 methods into 1, and pass owner as a Nullable type. Its default value is null, and this behavior is completely consistent with the previous purely immutable implementation, the ownership check now looks like this:

if (owner != null && ownedBy === owner) {
    buffer[nodeIndex] = newNode
    return this
}

For the mutable case, if the owners match, a mutable insertion into the buffer occurs, and promotion will be performed if necessary.

To summarize, the problem was that mutable removing did not preserve the tree invariant - if a node is not a root, it cannot contain only one field with data, since additional nodes appear due to collisions, and there must be something else in this node.

However, this implementation added problems with the behavior of iterators, tests keysIteratorTests, valuesIteratorTests and entriesIteratorTests failed, more on that below.

Iterations issue

TrieNode-45

The problem with iterators appeared after the promotion was added. To understand what exactly is going on, consider the following PersistentHashMap creation:

val map = persistentHashMapOf(1 to "a", 2  to "b", 3 to "c", 0 to "y", 32 to "z")

map has the internal structure shown in pic. 4. As in the example shown earlier, elements with keys 0 and 32 have moved to the second level because they have a collision in the lower 5 bits.

Next, we get the builder from the collection, which has an iterator, and iterate over the collection until we reach key 0:

val builder = map.builder()  
val iterator = builder.entries.iterator()  
  
while (iterator.hasNext()) {  
    val (key, _) = iterator.next()  
    if (key == 0) {  
        iterator.remove()  
    }  
}

When we reach key 0, we remove it. The node on the second level has only one data field (32: "z"), so to preserve the tree invariant, we promote this field to the next level. The field has key 32, the lower 5 bits of the number are 0, so the key is at the very beginning of the buffer and the next call to iterator.next() points to it (pic. 5.). And, as it becomes obvious, subsequent calls to iterator.next() will repeatedly go through keys 1, 2, 3, which is incorrect behavior.

After trying and thinking over several solutions, I came to what I think is the simplest, most elegant and efficient. This solution does not add any time and space complexity.

To simplify a bit, the PersistentHashMapBaseIterator class is an array of TrieNodeBaseIterators - one for each level of the tree (there are up to 7 levels), and the index of the current level. After deleting a value by the iterator, its structure is internally updated, namely, the next key in the tree is searched for and its level index and index inside the buffer are updated.

My solution is to check whether we have promoted the subsequent key, and if so, to interpret it as if this promotion did not happen, i.e. it is on the same level as the deleted key. Then all subsequent transitions of the iterator will remain correct, as it was before the implementation of promotion.

Checking whether a key has been promoted consists of checking two conditions. First, it must be strictly higher than its previous level (and, accordingly, the level of the deleted key). Second, we need to check whether it was in collision at its level with the deleted key. This can be done by checking whether the index of the current key inside the buffer would match the index of the deleted key inside the same buffer if the deleted key were promoted. So, the check looks like this:

val removedKeyPositionMask = removedKeyHash?.let { 1 shl indexSegment(it, shift) }  
  
if (keyPositionMask == removedKeyPositionMask && pathIndex < pathLastIndex) {  
    path[pathLastIndex].reset(arrayOf(node.buffer[keyIndex], node.buffer[keyIndex + 1]), ENTRY_SIZE)  
    return  
}

I added tests to check these edge cases, as well as tests to check the iterator invariants - looking at cases where they throw IllegalStateException and ConcurrentModificationException.

@DmitryNekrasov DmitryNekrasov self-assigned this Apr 21, 2025
@DmitryNekrasov DmitryNekrasov force-pushed the dmitry.nekrasov/bugfix/204 branch from 7584a94 to f082a92 Compare April 22, 2025 07:42
@DmitryNekrasov DmitryNekrasov requested a review from fzhinkin April 25, 2025 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants