Understanding the set() function

Posted on

Question :

Understanding the set() function

In python, `set()` is an unordered collection with no duplicate elements. However, I am not able to understand how it generates the output.

For example, consider the following:

``````>>> x = [1, 1, 2, 2, 2, 2, 2, 3, 3]
>>> set(x)
set([1, 2, 3])

>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> set(y)
set([8, 1, 6])

>>> z = [1, 1, 6, 6, 6, 6, 6, 7, 7]
>>> set(z)
set([1, 6, 7])
``````

Shouldn’t the output of `set(y)` be: `set([1, 6, 8])`? I tried the above two in Python 2.6.

Sets are unordered, as you say. Even though one way to implement sets is using a tree, they can also be implemented using a hash table (meaning getting the keys in sorted order may not be that trivial).

If you’d like to sort them, you can simply perform:

``````sorted(set(y))
``````

which will produce a sorted list containing the set’s elements. (Not a set. Again, sets are unordered.)

Otherwise, the only thing guaranteed by `set` is that it makes the elements unique (nothing will be there more than once).

Hope this helps!

As an unordered collection type, `set([8, 1, 6])` is equivalent to `set([1, 6, 8])`.

While it might be nicer to display the set contents in sorted order, that would make the `repr()` call more expensive.

Internally, the `set` type is implemented using a hash table: a hash function is used to separate items into a number of buckets to reduce the number of equality operations needed to check if an item is part of the set.

To produce the `repr()` output it just outputs the items from each bucket in turn, which is unlikely to be the sorted order.

As +Volatility and yourself pointed out, sets are unordered. If you need the elements to be in order, just call `sorted` on the set:

``````>>> y = [1, 1, 6, 6, 6, 6, 6, 8, 8]
>>> sorted(set(y))
[1, 6, 8]
``````

Python’s sets (and dictionaries) will iterate and print out in some order, but exactly what that order will be is arbitrary, and not guaranteed to remain the same after additions and removals.

Here’s an example of a set changing order after a lot of values are added and then removed:

``````>>> s = set([1,6,8])
>>> print(s)
{8, 1, 6}
>>> s.update(range(10,100000))
>>> for v in range(10, 100000):
s.remove(v)
>>> print(s)
{1, 6, 8}
``````

This is implementation dependent though, and so you should not rely upon it.