Removing from a list while iterating over it

Posted on

Solving problem is about exposing yourself to as many situations as possible like Removing from a list while iterating over it and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Removing from a list while iterating over it, which can be followed any time. Take easy to follow this discuss.

Removing from a list while iterating over it

The following code:

a = list(range(10))
remove = False
for b in a:
    if remove:
        a.remove(b)
    remove = not remove
print(a)

Outputs [0, 2, 3, 5, 6, 8, 9], instead of [0, 2, 4, 6, 8] when using Python 3.2.

  1. Why does it output these particular values?
  2. Why is no error given to indicate that underlying iterator is being modified?
  3. Have the mechanics changed from earlier versions of Python with respect to this behaviour?

Note that I am not looking to work around the behaviour, but to understand it.

Answer #1:

I debated answering this for a while, because similar questions have been asked many times here. But it’s just unique enough to be given the benefit of the doubt. (Still, I won’t object if others vote to close.) Here’s a visual explanation of what is happening.

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]       <-  b = 0; remove? no
 ^
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]       <-  b = 1; remove? yes
    ^
[0, 2, 3, 4, 5, 6, 7, 8, 9]          <-  b = 3; remove? no
       ^
[0, 2, 3, 4, 5, 6, 7, 8, 9]          <-  b = 4; remove? yes
          ^
[0, 2, 3, 5, 6, 7, 8, 9]             <-  b = 6; remove? no
             ^
[0, 2, 3, 5, 6, 7, 8, 9]             <-  b = 7; remove? yes
                ^
[0, 2, 3, 5, 6, 8, 9]                <-  b = 9; remove? no
                   ^

Since no one else has, I’ll attempt to answer your other questions:

Why is no error given to indicate that underlying iterator is being modified?

To throw an error without prohibiting many perfectly valid loop constructions, Python would have to know a lot about what’s going on, and it would probably have to get that information at runtime. All that information would take time to process. It would make Python a lot slower, in just the place where speed really counts — a loop.

Have the mechanics changed from earlier versions of Python with respect to this behaviour?

In short, no. Or at least I highly doubt it, and certainly it has behaved this way since I learned Python (2.4). Frankly I would expect any straightforward implementation of a mutable sequence to behave in just this way. Anyone who knows better, please correct me. (Actually, a quick doc lookup confirms that the text that Mikola cited has been in the tutorial since version 1.4!)

Answered By: senderle

Answer #2:

As Mikola explained, the actual result you observe is caused by the fact that deleting an entry from the list shifts the whole list over by one spot causing you to miss elements.

But the more interesting question, to my mind, is why python doesn’t elect to produce an error message when this happens. It does produce such an error message if you try to modify a dictionary. I think there are two reasons for that.

  1. Dict are complex internally, whereas lists are not. Lists are basically just arrays. A dict has to detect when its modified while being iterated over so as to avoid crashing when the internal structure of the dict changes. A list can get away without doing that check because it just makes sure that its current index is still in range.

  2. Historically, (I’m not sure about now), python lists were iterated over by using the [] operator. Python would evaluate list[0], list[1], list[2] until it got an IndexError. In that case, python wasn’t track the size of the list before it began so it had no method of detecting that the size of list had been changed.

Answered By: Winston Ewert

Answer #3:

Of course it is not safe to modify an array as you are iterating over it. The spec says it is a bad idea and the behavior is undefined:

http://docs.python.org/tutorial/controlflow.html#for-statements

So, the next question is what exactly is happening under the hood here? If I had to guess, I would say that it is doing something like this:

for(int i=0; i<len(array); ++i)
{
   do_loop_body(i);
}

If you suppose that this is indeed what is going on, then it explains the observed behavior completely. When you remove an element at or before the current pointer, you shift the whole list by 1 to the left. The first time, you remove a 1 — like usual — but now the list shifts backwards. The next iteration instead of hitting a 2, you hit a 3. Then you remove a 4, and the list shifts backwards. Next iteration 7, and so on.

Answered By: Mikola

Answer #4:

On your first iteration, you’re not removing and everything’s dandy.

Second iteration you’re at position [1] of the sequence, and you remove ‘1’. The iterator then takes you to position [2] in the sequence, which is now ‘3’, so ‘2’ gets skipped over (as ‘2’ is now at position [1] because of the removal). Of course ‘3’ doesn’t get removed, so you go on to position [3] in the sequence, which is now ‘4’. That gets removed, taking you to position [5] which is now ‘6’, and so on.

The fact that you’re removing things means that a position gets skipped over every time you perform a removal.

Answered By: Justin Simon
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.