Skip to content

Fix Issue 16659 - Clarify mutating while iterating rules #1704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

wilzbach
Copy link
Member

I just went with the current behavior - ideas for improvement are very welcome.

Btw ideally all these examples would get verified automatically and are runnable online ...

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
16659 Clarify mutating while iterating rules

@d-random-contributor
Copy link

I don't think allowing any mutation is a good idea, it's extremely dependent on implementation details. I think there should be at least a special contract for this, e.g. mutating range as @schveiguy proposed.

@PetarKirov
Copy link
Member

don't think allowing any mutation is a good idea, it's extremely dependent on implementation details.

Pinging our AA experts @MartinNowak @quickfur @rainers @schveiguy @yebblies for opinion.

@schveiguy
Copy link
Member

schveiguy commented Jun 15, 2017

No, you can't add or remove keys during iteration. See here: https://github.com/dlang/druntime/blob/master/src/rt/aaA.d#L680

Let's say you add a key, it causes a rehash. Your range is now pointing at an unfilled bucket, breaking the implied invariant of the range (it's either empty or pointing at a valid element).

You can change values of keys, as that does not affect the structure of the AA.

@schveiguy
Copy link
Member

I should add, the text says "value of key", just like I did above, but first time I read this, I thought it meant you can change the key itself while it's still in the AA. It should be reworded.

@d-random-contributor
Copy link

d-random-contributor commented Jun 15, 2017

as that does not affect the structure of the AA

I had an idea that hashtable DoS attack can be prevented by counting hash misses: when the number reaches some threshold value, the AA is rehashed with new hash seed. In this implementation setter can rehash the AA.

@schveiguy
Copy link
Member

In this implementation setter can rehash the AA.

This is dangerous. Lookups should not rehash.

auto v1 = key1 in aa;
auto v2 = key2 in aa; // invalidates v1?

@schveiguy
Copy link
Member

schveiguy commented Jun 15, 2017

Actually in current implementation, if you are pointing at an element, rehashing doesn't affect that. But the range isn't pointing at an element, it's pointing at the impl, and stores an idx of where it is. I still don't like the idea of rehashing for hash misses, at least not in the default AA implementation. DoS attacks are only going to happen in certain circumstances, you can use a specialized type for that.

@d-random-contributor
Copy link

invalidates v1?

Probably shouldn't. Is it documented?

@schveiguy
Copy link
Member

schveiguy commented Jun 15, 2017

The exact behavior is not documented. In general, I would say anything that causes a rehash would certainly invalidate the range. Whether it would invalidate a reference to a value is up for debate. As I said, currently it does not. Changing that aspect might be very bad for some people.

I would also say that changing the value mapped to an existing key should be documented not to cause a rehash. Neither should looking for an existing key in the hashmap. Too much code depends on this kind of thing.

@rainers
Copy link
Member

rainers commented Jun 15, 2017

Apart from random results of the iteration after insert or remove, the current range implementation is not perfectly safe with respect to memory corruption. After rehashing as part of aa.remove, the idx variable might point beyond the end of the buckets (resulting in bad empty-state and RangeErrors most of the time). It might also point to an empty bucket, and _aaRangeFrontValue will return an invalid non-null pointer.

@schveiguy
Copy link
Member

schveiguy commented Jun 15, 2017

_aaRangeFrontValue will return an invalid non-null pointer

Haha, yes, that is probably something we could fix easily, as the key is a null pointer (can check for that before adding the offset). Note that in most cases, accessing the invalid pointer should still be a segfault (which is somewhat safe).

@schveiguy
Copy link
Member

schveiguy commented Jun 15, 2017

_aaRangeFrontValue will return an invalid non-null pointer

Hmm... also noting that the front call in object.d is marked @trusted. That should be fixed, I'll file an issue.

Update: https://issues.dlang.org/show_bug.cgi?id=17507

@rainers
Copy link
Member

rainers commented Jun 15, 2017

Haha, yes, that is probably something we could fix easily,

I'm not sure a null pointer is a reasonable result. Is it ok (safe?) to have an iterator randomly continue, crash with null pointer access or produce a range error?

@MartinNowak
Copy link
Member

The key parameter should not be passed to foreach loops by ref, so your hopefully only mutating your copy of it.

@MartinNowak
Copy link
Member

MartinNowak commented Jun 15, 2017

Yes, deletion can trigger a shrink of the table and rehashing of all remaining keys, which does invalidate the iteration index.

I'm not sure a null pointer is a reasonable result. Is it ok (safe?) to have an iterator randomly continue, crash with null pointer access or produce a range error?

Yes, neither continuing iteration with a random element, nor deterministically crashing with null pointer, nor exiting with a RangeError do corrupt memory, so those would be @safe options.
Not sure that the current implementation does any of those though, but we should make it somehow @safe and guarantee that.

@schveiguy
Copy link
Member

The key parameter should not be passed to foreach loops by ref, so your hopefully only mutating your copy of it.

I'm guessing you misunderstood "value of key" like I did. He means you can change the value associated with the key, not the key itself.

@quickfur
Copy link
Member

This comes up every so often, the idea of mutating a container while iterating over it. The fact of the matter is, this will almost always lead to behaviour that most people would agree is strange, counter-intuitive, or otherwise pathological, unless (1) you only ever delete the current element during the iteration, and (2) the container was written explicitly to allow for this case. This applies to containers in general, but especially to unordered ones like AA's.

Consider an arbitrary AA (not necessary the current D implementation) that you're trying to iterate over. Suppose the iteration algorithm you're using produces the sequence of keys k1, k2, k3, ... kn. Let's say you're currently on k3, and then your code decides to insert a new key kx. The question is: should this new key be visible on this current round of iteration, or should it only be visible the next time you iterate the AA?

No matter which way you answer, there is always a case that will break your expectations, because AA's being unordered means that kx could end up before k3 or after k3, and you can't predict this beforehand (without going under the hood, of course). Furthermore, the act of inserting kx may trigger a rehash of the AA, thereby reordering k1...kn. Potentially, it could reorder it such that the iteration sequence now becomes k4, k2, k3, k1, ... kx, kn. So now, your overall sequence becomes k1, k2, k3, k1, ... kx, kn, because you already iterated over k1 and k2 before reaching k3, but due to the rehash, moving forward from k3 brings you back to k1, and then diverges into a different iteration sequence than it would have gone before. (And note that k4 now will never be visited on this round of iteration, because it had the bad luck of being rehashed before your current element.)

This is pathological behaviour, because when you write foreach (k, v; aa), the expectation is that you'll only visit each element once, in some arbitrary order. But modifying the AA in the interim may break this expectation: you may end up visiting some elements multiple times, and some elements may be skipped over, and the new element may or may not be visited in this round of iteration.

The same argument goes for deleting elements from the AA while iterating over it.

You could say, what if I create a list of keys from the AA first, then iterate over that instead? You could -- but then if you add new elements, they will not be visible until the next round of iteration, and if you delete elements, your list of keys may contain invalid keys. And if you delete and insert keys, you may end up with new keys that will appear in the current iteration because it coincides with the deleted key. This may or may not be your intention. Not to mention the inefficiency of creating a list of keys prior to iteration.

Basically, modifying a container while iterating over it is a Bad Idea. Unless you absolutely know what you're doing and want the exact resulting semantics, I'd highly recommend not doing this. The only exception, as I said, is when you only ever delete an element in the current position, and the container explicitly supports this operation (and guarantees that iteration order won't suddenly randomize on you, or suddenly stop because the current node's next pointer is now null).

@quickfur
Copy link
Member

quickfur commented Jun 15, 2017

Because of this, I recommend not explicating what happens with mutating a container while iterating over it -- rather, the user should be discouraged from doing this. The exact behaviour will depend on the implementation details of the container, so putting this in the docs and having user code rely on the current behaviour will limit the ability to change the container's implementation later. This is leaky abstraction, it's a bad idea.

@schveiguy
Copy link
Member

@quickfur that's all good, but modifying a value that's in an AA is expected to do nothing to the structure. It's perfectly acceptable, and I would say many people rely on it. Inserting/removing keys is what is bad. We should define and document this and call it a day.

@quickfur
Copy link
Member

Changing the value in an AA is perfectly fine, obviously.

What's not OK is changing the key. The proposed change in this PR "officially" sanctions changing the current key, mentioning no worse side effect than "the key may not be part of the current iteration". This is an understatement; adding or deleting keys while iterating over an AA can potentially completely screw up the current iteration in the sense I described above, that you may start seeing keys you've already traversed or skipping over keys that you haven't seen yet in the current iteration.

Furthermore, the current behaviour is arguably specific to the current implementation; user code really should not be relying on such behaviour. I'd even go as far as saying that adding/deleting keys from the AA while iterating over it may result in UB. At least, it will result in an undefined subsequent sequence of keys traversed by the current iteration.

@quickfur
Copy link
Member

Also, note that mutating the key in an AA will cause bugs, because the hash value will not be updated, so the AA becomes internally inconsistent, and may start accepting duplicate keys and not "seeing" an existing key.

Copy link
Member

@quickfur quickfur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree that the docs should explicitly legalize mutating the AA while iterating over it.

Doing so may lead to the current iteration exhibiting strange behaviour, like revisiting keys that have already been visited, skipping over keys not yet visited, or terminating iteration abruptly.

Even if the current implementation does not rehash when deleting keys, this is dependent on implementation details subject to future change, and user code should not rely on it.

Rather, the docs should explicitly discourage users from inserting / deleting keys from the AA while iterating over it. Unless you immediately terminate iteration after the insertion / deletion.

@wilzbach
Copy link
Member Author

I don't agree that the docs should explicitly legalize mutating the AA while iterating over it.

@ all thanks a lot for your very active discussion and feedback. I went with the initial wording, because that's how the bug report described the "problem"

Anyhow I have tried to incorporate your feedback into a rewording of this clarification. Please feel free to change, improve and/or nitpick!

of the ongoing iteration. Similarly $(I inserting) a new key isn't
guaranteed to maintain the same iteration. $(I Modifying) a key needs to
be considered as a removal and insertion and thus can trigger a rehashing
as well.
Copy link
Member

@quickfur quickfur Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: modifying an AA key in-place is illegal, because the current implementation will get into an invalid state, and in all likelihood you will not be able to find the key again when doing AA lookups, though it will still appear when iterating over the AA. This invalid state may also lead to entries with duplicate keys in the AA.

Even outside the considerations of the current implementation, any AA implementation's correctness is keyed on the key being immutable after you add it to the AA; breaking this contract will cause malfunctions. We did try to limit this problem a few years ago by making the compiler automatically convert AA keys to const. Unfortunately, this is not sufficient, because the caller may still hold mutable references to the key, and thus cause havoc when the key is subsequently modified. Ideally, the AA key type should be immutable, but this was not done for various reasons that I can't recall now.

You can see this effect for yourself, in fact:

void main() {
        int[const(char)[]] aa;
        char[] key1 = "abc".dup;
        
        // Trick the compiler into accepting this key into the AA. Seems dmd
        // git HEAD is smart enough to reject adding key1 directly to aa.
        const(char)[] keyref = key1;
        
        aa[keyref] = 123;
        key1[] = "def"[]; // overwrite key referenced by AA entry
        //assert(("abc" in aa) !is null); // fails
        //assert(("def" in aa) !is null); // *also* fails

        aa["abc"] = 456;  // duplicate key
        //assert(aa.length == 1);       // fails(!)

        assert(aa["abc"] == 456); // OK
        aa.remove("abc"); 
        
        //assert(aa.length == 0); // fails
        key1[] = "abc"[]; // mutate original key back to its "correct" value
        assert(aa["abc"] == 123); // it remembers its original value!
}

As you can see, the AA exhibits very strange (and buggy) behaviour when you mutate a key in-place.

TLDR, mutating an AA key is very bad, even worse than inserting/deleting keys while iterating, and should never, ever be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants