When I am writing code in Python, I often need to remove items from a list or other sequence type based on some criteria. I haven’t found a solution that is elegant and efficient, as removing items from a list you are currently iterating through is bad. For example, you can’t do this:
for name in names: if name[-5:] == 'Smith': names.remove(name)
I usually end up doing something like this:
toremove =  for name in names: if name[-5:] == 'Smith': toremove.append(name) for name in toremove: names.remove(name) del toremove
This is innefficient, fairly ugly and possibly buggy (how does it handle multiple ‘John Smith’ entries?). Does anyone have a more elegant solution, or at least a more efficient one?
How about one that works with dictionaries?
Two easy ways to accomplish just the filtering are:
names = filter(lambda name: name[-5:] != "Smith", names)
Using list comprehensions:
names = [name for name in names if name[-5:] != "Smith"]
Note that both cases keep the values for which the predicate function evaluates to
True, so you have to reverse the logic (i.e. you say “keep the people who do not have the last name Smith” instead of “remove the people who have the last name Smith”).
Edit Funny… two people individually posted both of the answers I suggested as I was posting mine.
You can also iterate backwards over the list:
for name in reversed(names): if name[-5:] == 'Smith': names.remove(name)
This has the advantage that it does not create a new list (like
filter or a list comprehension) and uses an iterator instead of a list copy (like
Note that although removing elements while iterating backwards is safe, inserting them is somewhat trickier.
The obvious answer is the one that John and a couple other people gave, namely:
for name in names if name[-5:] != "Smith"] # <-- slowernames = [name
But that has the disadvantage that it creates a new list object, rather than reusing the original object. I did some profiling and experimentation, and the most efficient method I came up with is:
for name in names if name[-5:] != "Smith") # <-- fasternames[:] = (name
Assigning to “names[:]” basically means “replace the contents of the names list with the following value”. It’s different from just assigning to names, in that it doesn’t create a new list object. The right hand side of the assignment is a generator expression (note the use of parentheses rather than square brackets). This will cause Python to iterate across the list.
Some quick profiling suggests that this is about 30% faster than the list comprehension approach, and about 40% faster than the filter approach.
Caveat: while this solution is faster than the obvious solution, it is more obscure, and relies on more advanced Python techniques. If you do use it, I recommend accompanying it with a comment. It’s probably only worth using in cases where you really care about the performance of this particular operation (which is pretty fast no matter what). (In the case where I used this, I was doing A* beam search, and used this to remove search points from the search beam.)
Using a list comprehension
list = [x for x in list if x[-5:] != "smith"]
There are times when filtering (either using filter or a list comprehension) doesn’t work. This happens when some other object is holding a reference to the list you’re modifying and you need to modify the list in place.
for name in names[:]: if name[-5:] == 'Smith': names.remove(name)
The only difference from the original code is the use of
names[:] instead of
names in the for loop. That way the code iterates over a (shallow) copy of the list and the removals work as expected. Since the list copying is shallow, it’s fairly quick.
filter would be awesome for this. Simple example:
names = ['mike', 'dave', 'jim'] filter(lambda x: x != 'mike', names) ['dave', 'jim']
Edit: Corey’s list comprehension is awesome too.
names = filter(lambda x: x[-5:] != "Smith", names);
Both solutions, filter and comprehension requires building a new list. I don’t know enough of the Python internals to be sure, but I think that a more traditional (but less elegant) approach could be more efficient:
names = ['Jones', 'Vai', 'Smith', 'Perez'] item = 0 while item <> len(names): name = names [item] if name=='Smith': names.remove(name) else: item += 1 print names
Anyway, for short lists, I stick with either of the two solutions proposed earlier.