Is there a NumPy function to return the first index of something in an array?

Posted on

Solving problem is about exposing yourself to as many situations as possible like Is there a NumPy function to return the first index of something in an array? and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Is there a NumPy function to return the first index of something in an array?, which can be followed any time. Take easy to follow this discuss.

Is there a NumPy function to return the first index of something in an array?

I know there is a method for a Python list to return the first index of something:

>>> l = [1, 2, 3]
>>> l.index(2)
1

Is there something like that for NumPy arrays?

Asked By: Nope

||

Answer #1:

Yes, given an array, array, and a value, item to search for, you can use np.where as:

itemindex = numpy.where(array==item)

The result is a tuple with first all the row indices, then all the column indices.

For example, if an array is two dimensions and it contained your item at two locations then

array[itemindex[0][0]][itemindex[1][0]]

would be equal to your item and so would be:

array[itemindex[0][1]][itemindex[1][1]]
Answered By: Alex

Answer #2:

If you need the index of the first occurrence of only one value, you can use nonzero (or where, which amounts to the same thing in this case):

>>> t = array([1, 1, 1, 2, 2, 3, 8, 3, 8, 8])
>>> nonzero(t == 8)
(array([6, 8, 9]),)
>>> nonzero(t == 8)[0][0]
6

If you need the first index of each of many values, you could obviously do the same as above repeatedly, but there is a trick that may be faster. The following finds the indices of the first element of each subsequence:

>>> nonzero(r_[1, diff(t)[:-1]])
(array([0, 3, 5, 6, 7, 8]),)

Notice that it finds the beginning of both subsequence of 3s and both subsequences of 8s:

[1, 1, 1, 2, 2, 3, 8, 3, 8, 8]

So it’s slightly different than finding the first occurrence of each value. In your program, you may be able to work with a sorted version of t to get what you want:

>>> st = sorted(t)
>>> nonzero(r_[1, diff(st)[:-1]])
(array([0, 3, 5, 7]),)
Answered By: Vebjorn Ljosa

Answer #3:

You can also convert a NumPy array to list in the air and get its index. For example,

l = [1,2,3,4,5] # Python list
a = numpy.array(l) # NumPy array
i = a.tolist().index(2) # i will return index of 2
print i

It will print 1.

Answered By: Hima

Answer #4:

Just to add a very performant and handy alternative based on np.ndenumerate to find the first index:

from numba import njit
import numpy as np
@njit
def index(array, item):
    for idx, val in np.ndenumerate(array):
        if val == item:
            return idx
    # If no item was found return None, other return types might be a problem due to
    # numbas type inference.

This is pretty fast and deals naturally with multidimensional arrays:

>>> arr1 = np.ones((100, 100, 100))
>>> arr1[2, 2, 2] = 2
>>> index(arr1, 2)
(2, 2, 2)
>>> arr2 = np.ones(20)
>>> arr2[5] = 2
>>> index(arr2, 2)
(5,)

This can be much faster (because it’s short-circuiting the operation) than any approach using np.where or np.nonzero.


However np.argwhere could also deal gracefully with multidimensional arrays (you would need to manually cast it to a tuple and it’s not short-circuited) but it would fail if no match is found:

>>> tuple(np.argwhere(arr1 == 2)[0])
(2, 2, 2)
>>> tuple(np.argwhere(arr2 == 2)[0])
(5,)
Answered By: MSeifert

Answer #5:

If you’re going to use this as an index into something else, you can use boolean indices if the arrays are broadcastable; you don’t need explicit indices. The absolute simplest way to do this is to simply index based on a truth value.

other_array[first_array == item]

Any boolean operation works:

a = numpy.arange(100)
other_array[first_array > 50]

The nonzero method takes booleans, too:

index = numpy.nonzero(first_array == item)[0][0]

The two zeros are for the tuple of indices (assuming first_array is 1D) and then the first item in the array of indices.

Answered By: Matt

Answer #6:

l.index(x) returns the smallest i such that i is the index of the first occurrence of x in the list.

One can safely assume that the index() function in Python is implemented so that it stops after finding the first match, and this results in an optimal average performance.

For finding an element stopping after the first match in a NumPy array use an iterator (ndenumerate).

In [67]: l=range(100)
In [68]: l.index(2)
Out[68]: 2

NumPy array:

In [69]: a = np.arange(100)
In [70]: next((idx for idx, val in np.ndenumerate(a) if val==2))
Out[70]: (2L,)

Note that both methods index() and next return an error if the element is not found. With next, one can use a second argument to return a special value in case the element is not found, e.g.

In [77]: next((idx for idx, val in np.ndenumerate(a) if val==400),None)

There are other functions in NumPy (argmax, where, and nonzero) that can be used to find an element in an array, but they all have the drawback of going through the whole array looking for all occurrences, thus not being optimized for finding the first element. Note also that where and nonzero return arrays, so you need to select the first element to get the index.

In [71]: np.argmax(a==2)
Out[71]: 2
In [72]: np.where(a==2)
Out[72]: (array([2], dtype=int64),)
In [73]: np.nonzero(a==2)
Out[73]: (array([2], dtype=int64),)

Time comparison

Just checking that for large arrays the solution using an iterator is faster when the searched item is at the beginning of the array (using %timeit in the IPython shell):

In [285]: a = np.arange(100000)
In [286]: %timeit next((idx for idx, val in np.ndenumerate(a) if val==0))
100000 loops, best of 3: 17.6 µs per loop
In [287]: %timeit np.argmax(a==0)
1000 loops, best of 3: 254 µs per loop
In [288]: %timeit np.where(a==0)[0][0]
1000 loops, best of 3: 314 µs per loop

This is an open NumPy GitHub issue.

See also: Numpy: find first index of value fast

Answered By: user2314737

Answer #7:

For one-dimensional sorted arrays, it would be much more simpler and efficient O(log(n)) to use numpy.searchsorted which returns a NumPy integer (position). For example,

arr = np.array([1, 1, 1, 2, 3, 3, 4])
i = np.searchsorted(arr, 3)

Just make sure the array is already sorted

Also check if returned index i actually contains the searched element, since searchsorted’s main objective is to find indices where elements should be inserted to maintain order.

if arr[i] == 3:
    print("present")
else:
    print("not present")
Answered By: Alok Nayak

Answer #8:

To index on any criteria, you can so something like the following:

In [1]: from numpy import *
In [2]: x = arange(125).reshape((5,5,5))
In [3]: y = indices(x.shape)
In [4]: locs = y[:,x >= 120] # put whatever you want in place of x >= 120
In [5]: pts = hsplit(locs, len(locs[0]))
In [6]: for pt in pts:
   .....:         print(', '.join(str(p[0]) for p in pt))
4, 4, 0
4, 4, 1
4, 4, 2
4, 4, 3
4, 4, 4

And here’s a quick function to do what list.index() does, except doesn’t raise an exception if it’s not found. Beware — this is probably very slow on large arrays. You can probably monkey patch this on to arrays if you’d rather use it as a method.

def ndindex(ndarray, item):
    if len(ndarray.shape) == 1:
        try:
            return [ndarray.tolist().index(item)]
        except:
            pass
    else:
        for i, subarray in enumerate(ndarray):
            try:
                return [i] + ndindex(subarray, item)
            except:
                pass
In [1]: ndindex(x, 103)
Out[1]: [4, 0, 3]
Answered By: Autoplectic

Leave a Reply

Your email address will not be published.