# Remove all the elements that occur in one list from another

Posted on

Solving problem is about exposing yourself to as many situations as possible like Remove all the elements that occur in one list from another and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Remove all the elements that occur in one list from another, which can be followed any time. Take easy to follow this discuss.

Remove all the elements that occur in one list from another

Let’s say I have two lists, `l1` and `l2`. I want to perform `l1 - l2`, which returns all elements of `l1` not in `l2`.

I can think of a naive loop approach to doing this, but that is going to be really inefficient. What is a pythonic and efficient way of doing this?

As an example, if I have `l1 = [1,2,6,8] and l2 = [2,3,5,8]`, `l1 - l2` should return `[1,6]`

Python has a language feature called List Comprehensions that is perfectly suited to making this sort of thing extremely easy. The following statement does exactly what you want and stores the result in `l3`:

``````l3 = [x for x in l1 if x not in l2]
``````

`l3` will contain `[1, 6]`.

One way is to use sets:

``````>>> set([1,2,6,8]) - set([2,3,5,8])
set([1, 6])
``````

Note, however, that sets do not preserve the order of elements, and cause any duplicated elements to be removed. The elements also need to be hashable. If these restrictions are tolerable, this may often be the simplest and highest performance option.

## Performance Comparisons

Comparing the performance of all the answers mentioned here on Python 3.9.1 and Python 2.7.16.

### Python 3.9.1

Answers are mentioned in order of performance:

1. Arkku’s `set` difference using subtraction “-” operation – (91.3 nsec per loop)

``````mquadri\$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
5000000 loops, best of 5: 91.3 nsec per loop
``````
2. Moinuddin Quadri’s using `set().difference()`– (133 nsec per loop)

``````mquadri\$ python3 -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)"
2000000 loops, best of 5: 133 nsec per loop
``````
3. Moinuddin Quadri’s list comprehension with `set` based lookup– (366 nsec per loop)

`````` mquadri\$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
1000000 loops, best of 5: 366 nsec per loop
``````
4. Donut’s list comprehension on plain list – (489 nsec per loop)

`````` mquadri\$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
500000 loops, best of 5: 489 nsec per loop
``````
5. Daniel Pryden’s generator expression with `set` based lookup and type-casting to `list`(583 nsec per loop) : Explicitly type-casting to list to get the final object as `list`, as requested by OP. If generator expression is replaced with list comprehension, it’ll become same as Moinuddin Quadri’s list comprehension with `set` based lookup.

`````` mquadri\$ mquadri\$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)"
500000 loops, best of 5: 583 nsec per loop
``````
6. Moinuddin Quadri’s using `filter()` and explicitly type-casting to `list` (need to explicitly type-cast as in Python 3.x, it returns iterator) – (681 nsec per loop)

`````` mquadri\$ python3 -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(filter(lambda x: x not in l2, l1))"
500000 loops, best of 5: 681 nsec per loop
``````
7. Akshay Hazari’s using combination of `functools.reduce` + `filter` -(3.36 usec per loop) : Explicitly type-casting to `list` as from Python 3.x it started returned returning iterator. Also we need to import `functools` to use `reduce` in Python 3.x

`````` mquadri\$ python3 -m timeit "from functools import reduce; l1 = [1,2,6,8]; l2 = [2,3,5,8];" "list(reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2))"
100000 loops, best of 5: 3.36 usec per loop
``````

### Python 2.7.16

Answers are mentioned in order of performance:

1. Arkku’s `set` difference using subtraction “-” operation – (0.0783 usec per loop)

``````mquadri\$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1 - l2"
10000000 loops, best of 3: 0.0783 usec per loop
``````
2. Moinuddin Quadri’s using `set().difference()`– (0.117 usec per loop)

``````mquadri\$ mquadri\$ python -m timeit -s "l1 = set([1,2,6,8]); l2 = set([2,3,5,8]);" "l1.difference(l2)"
10000000 loops, best of 3: 0.117 usec per loop
``````
3. Moinuddin Quadri’s list comprehension with `set` based lookup– (0.246 usec per loop)

`````` mquadri\$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "[x for x in l1 if x not in l2]"
1000000 loops, best of 3: 0.246 usec per loop
``````
4. Donut’s list comprehension on plain list – (0.372 usec per loop)

`````` mquadri\$ python -m timeit -s "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "[x for x in l1 if x not in l2]"
1000000 loops, best of 3: 0.372 usec per loop
``````
5. Moinuddin Quadri’s using `filter()` – (0.593 usec per loop)

`````` mquadri\$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "filter(lambda x: x not in l2, l1)"
1000000 loops, best of 3: 0.593 usec per loop
``````
6. Daniel Pryden’s generator expression with `set` based lookup and type-casting to `list`(0.964 per loop) : Explicitly type-casting to list to get the final object as `list`, as requested by OP. If generator expression is replaced with list comprehension, it’ll become same as Moinuddin Quadri’s list comprehension with `set` based lookup.

`````` mquadri\$ python -m timeit -s "l1 = [1,2,6,8]; l2 = set([2,3,5,8]);" "list(x for x in l1 if x not in l2)"
1000000 loops, best of 3: 0.964 usec per loop
``````
7. Akshay Hazari’s using combination of `functools.reduce` + `filter` -(2.78 usec per loop)

`````` mquadri\$ python -m timeit "l1 = [1,2,6,8]; l2 = [2,3,5,8];" "reduce(lambda x,y : filter(lambda z: z!=y,x) ,l1,l2)"
100000 loops, best of 3: 2.78 usec per loop
``````

Expanding on Donut’s answer and the other answers here, you can get even better results by using a generator comprehension instead of a list comprehension, and by using a `set` data structure (since the `in` operator is O(n) on a list but O(1) on a set).

So here’s a function that would work for you:

``````def filter_list(full_list, excludes):
s = set(excludes)
return (x for x in full_list if x not in s)
``````

The result will be an iterable that will lazily fetch the filtered list. If you need a real list object (e.g. if you need to do a `len()` on the result), then you can easily build a list like so:

``````filtered_list = list(filter_list(full_list, excludes))
``````

Use the Python set type. That would be the most Pythonic. 🙂

Also, since it’s native, it should be the most optimized method too.

See:

http://docs.python.org/library/stdtypes.html#set

http://docs.python.org/library/sets.htm (for older python)

``````# Using Python 2.7 set literal format.
# Otherwise, use: l1 = set([1,2,6,8])
#
l1 = {1,2,6,8}
l2 = {2,3,5,8}
l3 = l1 - l2
``````

use Set Comprehensions {x for x in l2} or set(l2) to get set, then use List Comprehensions to get list

``````l2set = set(l2)
l3 = [x for x in l1 if x not in l2set]
``````

benchmark test code:

``````import time
l1 = list(range(1000*10 * 3))
l2 = list(range(1000*10 * 2))
l2set = {x for x in l2}
tic = time.time()
l3 = [x for x in l1 if x not in l2set]
toc = time.time()
diffset = toc-tic
print(diffset)
tic = time.time()
l3 = [x for x in l1 if x not in l2]
toc = time.time()
difflist = toc-tic
print(difflist)
print("speedup %fx"%(difflist/diffset))
``````

benchmark test result:

``````0.0015058517456054688
3.968189239501953
speedup 2635.179227x
``````

Alternate Solution :

``````reduce(lambda x,y : filter(lambda z: z!=y,x) ,[2,3,5,8],[1,2,6,8])
``````

# Sets versus list comprehension benchmark on Python 3.8

tldr: Use Arkku’s set solution, it’s even faster than promised in comparison!

## Checking existing files against a list

In my example I found it to be 40 times (!) faster to use Arkku’s set solution than the pythonic list comprehension for a real world application of checking existing filenames against a list.

### List comprehension:

``````%%time
import glob
existing = [int(os.path.basename(x).split(".")[0]) for x in glob.glob("*.txt")]
wanted = list(range(1, 100000))
[i for i in wanted if i not in existing]
``````

Wall time: 28.2 s

### Sets

``````%%time
import glob
existing = [int(os.path.basename(x).split(".")[0]) for x in glob.glob("*.txt")]
wanted = list(range(1, 100000))
set(wanted) - set(existing)
``````

Wall time: 689 ms