Numpy individual element access slower than for lists

Posted on

Question :

Numpy individual element access slower than for lists

I just started using Numpy and noticed that iterating through each element in a Numpy array is ~4x slower than doing the same but with a list of lists. I know now that this defeats the purpose of Numpy and I should vectorize the function if possible. My question is though why is it 4x slower. That seems like quite a large amount.

I ran the tests below using %timeit

import numpy as np
b = np.eye(1000)
a = b.tolist()

%timeit b[100][100] #1000000 loops, best of 3: 692 ns per loop
%timeit a[100][100] #10000000 loops, best of 3: 70.7 ns per loop
%timeit b[100,100] #1000000 loops, best of 3: 343 ns per loop
%timeit b.item(100,100) #1000000 loops, best of 3: 297 ns per loop

I tried to use dis.dis to see what was going on under the hood but got:

TypeError: don't know how to disassemble method-wrapper objects

Then I tried to look at the Numpy source code but couldn’t figure out which file corresponded to array element access. I’m curious what accounts for the extra overhead, and more importantly how to figure this out for myself in the future. It seems like python can’t be easily compiled to C code so that I can see the difference. But is there a way to see what byte code is generated for each line, to get a sense of the differences?

Answer #1:

In summary: getting an item from a NumPy array requires new Python objects to be created, whereas this is not the case for lists. Also, indexing is more slightly more complicated for NumPy arrays than lists which may add some additional overhead.

To recap, the NumPy operations you have listed do the following:

  1. b[100][100] returns row 100 of b as an array, and then gets the value at index 100 of this row, returning the value as an object (e.g. a np.int64 type).
  2. b[100,100] returns the value at row 100 and column 100 directly (no intermediate array is returned first).
  3. b.item(100,100) does the same as above b[100,100] except that the value is converted to a native Python type and returned.

Now of these operation, (1) is slowest because it requires two sequential NumPy indexing operations (I’ll explain why this is slower than list indexing below). (2) is quickest because only a single indexing operation is performed. Operation (3) is possibly slower as it is a method call (these are generally slow in Python).

Why is list access still faster than b[100,100]?

Object creation

Python lists are arrays of pointers to objects in memory. For example, the list [1, 2, 3] does not contain those integers directly, but rather pointers to the memory addresses were each integer object already exists. To get an item from the list, Python just returns a reference to the object.

NumPy arrays are not collections of objects. The array np.array([1, 2, 3]) is just a contiguous block of memory with bits set to represent those integer values. To get an integer from this array, a new Python object must be constructed in memory separate to the array. For instance, an object of np.int64 may be returned by the indexing operation: this object did not exist previously and had to be created.

Indexing complexity

Two other reasons why a[100][100] (getting from the list) is quicker than b[100,100] (getting from the array) are that:

  • The bytecode opcode BINARY_SUBSCR is executed when indexing both lists and arrays, but it is optimised for the case of Python lists.

  • The internal C function handling integer indexing for Python lists is very short and simple. On the other hand, NumPy indexing is much more complicated and a significant amount of code is executed to determine the type of indexing being used so that the correct value can be returned.

Below, the steps for accessing elements in a list and array with a[100][100] and b[100,100] are described in more detail.


The same four bytecode opcodes are triggered for both lists and arrays:

  0 LOAD_NAME                0 (a)           # the list or array
  3 LOAD_CONST               0 (100)         # index number (tuple for b[100,100])
  6 BINARY_SUBSCR                            # find correct "getitem" function
  7 RETURN_VALUE                             # value returned from list or array

Note: if you start chain indexing for multi-dimensional lists, e.g. a[100][100][100], you start to repeat these bytecode instructions. This does not happen for NumPy arrays using the tuple indexing: b[100,100,100] uses just the four instructions. This is why the gap in the timings begins to close as the number of dimensions increases.

Finding the correct “getitem” function

The functions for accessing lists and arrays are different and the correct one needs to be found in each case. This task is handled by the BINARY_SUBSCR opcode:

w = POP();                                            // our index
v = TOP();                                            // our list or NumPy array
if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {    // do we have a list and an int?
    /* INLINE: list[int] */
    Py_ssize_t i = PyInt_AsSsize_t(w);
        if (i < 0)
             i += PyList_GET_SIZE(v);
        if (i >= 0 && i < PyList_GET_SIZE(v)) {
             x = PyList_GET_ITEM(v, i);               // call "getitem" for lists
            goto slow_get;
         x = PyObject_GetItem(v, w);                  // else, call another function
                                                      // to work out what is needed
     if (x != NULL) continue;

This code is optimised for Python lists. If the function sees a list, it will quickly call the function PyList_GET_ITEM. This list can now be accessed at the required index (see next section below).

However, if it doesn’t see a list (e.g. we have a NumPy array), it takes the “slow_get” path. This in turn calls another function PyObject_GetItem to check which “getitem” function the object is mapped to:

PyObject_GetItem(PyObject *o, PyObject *key)
    PyMappingMethods *m;

    if (o == NULL || key == NULL)
        return null_error();

    m = o->ob_type->tp_as_mapping;
    if (m && m->mp_subscript)
        return m->mp_subscript(o, key);

In the case of NumPy arrays, the correct function is located in mp_subscript in the PyMappingMethods structure.

Notice the additional function calls before this correct “get” function can be called. These calls add to the overhead for b[100], although how much will depend on how Python/NumPy was compiled, the system architecture, and so on.

Getting from a Python list

Above it was seen that the function PyList_GET_ITEM is called. This is a short function that essentially looks like this*:

PyList_GetItem(PyObject *op, Py_ssize_t i)
    if (!PyList_Check(op)) {                            // check if list
        return NULL;
    if (i < 0 || i >= Py_SIZE(op)) {                    // check i is in range
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    return ((PyListObject *)op) -> ob_item[i];           // return reference to object

* PyList_GET_ITEM is actually the macro form of this function which does the same thing, minus error checking.

This means that getting the item at index i of a Python list is relatively simple. Internally, Python checks whether the type of the item being is a list, whether i is in the correct range for the list, and then returns the reference to the object in the list.

Getting from a NumPy array

In contrast, NumPy has to do much more work before the value at the requested index can be returned.

Arrays can be indexed in a variety of different ways and NumPy has to decide which index routine is needed. The various indexing routines are handled largely by code in mapping.c.

Anything used to index NumPy arrays passes through the function prepare_index which begins the parsing of the index and stores the information about broadcasting, number of dimensions, and so on. Here is the call signature for the function:

prepare_index(PyArrayObject *self, PyObject *index,
              npy_index_info *indices,
              int *num, int *ndim, int *out_fancy_ndim, int allow_boolean)

 /* @param the array being indexed
  * @param the index object
  * @param index info struct being filled (size of NPY_MAXDIMS * 2 + 1)
  * @param number of indices found
  * @param dimension of the indexing result
  * @param dimension of the fancy/advanced indices part
  * @param whether to allow the boolean special case 

The function has to do a lot of checks. Even for a relatively simple index such as b[100,100], a lot of information has to be inferred so that NumPy can return a reference (view) to the correct value.

In conclusion, it takes longer for the “getitem” function for NumPy to be found and the functions handling the indexing of arrays are necessarily more complex than the single function for Python lists.

Answered By: Alex Riley

Answer #2:

When numpy returns the item from one position in the array, it must convert the internal C type (float, double, etc.) value into a Python-typed scalar value (int, long, float). It then returns the reference to the Python-typed value. This conversion takes some time.

Interestingly enough, the same inefficiency hurts performance in another way, too. I had a Python list that I was indexing into using index values that came from a numpy array. That same conversion occurs to create the Python integer value that is needed to index into a Python list. I had to rewrite my algorithm with an intermediate array of native Python integers.

Answered By: ColoradoEric

Leave a Reply

Your email address will not be published. Required fields are marked *