Find nearest value in numpy array

Posted on

Solving problem is about exposing yourself to as many situations as possible like Find nearest value in numpy array and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Find nearest value in numpy array, which can be followed any time. Take easy to follow this discuss.

Find nearest value in numpy array

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

np.find_nearest( array, value )

Answer #1:

import numpy as np
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]
array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]
value = 0.5
print(find_nearest(array, value))
# 0.568743859261
Answered By: unutbu

Answer #2:

IF your array is sorted and is very large, this is a much faster solution:

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

This scales to very large arrays. You can easily modify the above to sort in the method if you can’t assume that the array is already sorted. It’s overkill for small arrays, but once they get large this is much faster.

Answered By: Demitri

Answer #3:

With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, …):

def find_nearest(a, a0):
    "Element in nd array `a` closest to the scalar value `a0`"
    idx = np.abs(a - a0).argmin()
    return a.flat[idx]

Or, written as a single line:

a.flat[np.abs(a - a0).argmin()]
Answered By: kwgoodman

Answer #4:

Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either.
If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first – note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

Bisection:

def bisection(array,value):
    '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
    and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
    to indicate that ``value`` is out of range below and above respectively.'''
    n = len(array)
    if (value < array[0]):
        return -1
    elif (value > array[n-1]):
        return n
    jl = 0# Initialize lower
    ju = n-1# and upper limits.
    while (ju-jl > 1):# If we are not yet done,
        jm=(ju+jl) >> 1# compute a midpoint with a bitshift
        if (value >= array[jm]):
            jl=jm# and replace either the lower limit
        else:
            ju=jm# or the upper limit, as appropriate.
        # Repeat until the test condition is satisfied.
    if (value == array[0]):# edge cases at bottom
        return 0
    elif (value == array[n-1]):# and top
        return n-1
    else:
        return jl

Now I’ll define the code from the other answers, they each return an index:

import math
import numpy as np
def find_nearest1(array,value):
    idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
    return idx
def find_nearest2(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return indices
def find_nearest3(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
    out = array[indices]
    return indices
def find_nearest4(array,value):
    idx = (np.abs(array-value)).argmin()
    return idx
def find_nearest5(array, value):
    idx_sorted = np.argsort(array)
    sorted_array = np.array(array[idx_sorted])
    idx = np.searchsorted(sorted_array, value, side="left")
    if idx >= len(array):
        idx_nearest = idx_sorted[len(array)-1]
    elif idx == 0:
        idx_nearest = idx_sorted[0]
    else:
        if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
            idx_nearest = idx_sorted[idx-1]
        else:
            idx_nearest = idx_sorted[idx]
    return idx_nearest
def find_nearest6(array,value):
    xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
    return xi

Now I’ll time the codes:
Note methods 1,2,4,5 don’t correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

array = np.arange(100000)
val = array[50000]+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)
(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop
[50000]
1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop
[50000]
1000 loops, best of 3: 746 µs per loop

For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 – 1000 times faster). For smaller arrays it’s ~2-100 times faster.

Answered By: Josh Albert

Answer #5:

Here’s an extension to find the nearest vector in an array of vectors.

import numpy as np
def find_nearest_vector(array, value):
  idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
  return array[idx]
A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
   [ 48.79558706,  47.79243283],
   [ 38.42774411,  84.87155478],
   [ 63.64371943,  50.7722317 ],
   [ 73.56362857,  27.87895698],
   [ 96.67790593,  77.76150486],
   [ 68.86202147,  21.38735169],
   [  5.21796467,  59.17051276],
   [ 82.92389467,  99.90387851],
   [  6.76626539,  30.50661753]])"""
pt = [6, 30]
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])
Answered By: Onasafari

Answer #6:

If you don’t want to use numpy this will do it:

def find_nearest(array, value):
    n = [abs(i-value) for i in array]
    idx = n.index(min(n))
    return array[idx]
Answered By: Nick Crawford

Answer #7:

Here’s a version that will handle a non-scalar “values” array:

import numpy as np
def find_nearest(array, values):
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    return array[indices]

Or a version that returns a numeric type (e.g. int, float) if the input is scalar:

def find_nearest(array, values):
    values = np.atleast_1d(values)
    indices = np.abs(np.subtract.outer(array, values)).argmin(0)
    out = array[indices]
    return out if len(out) > 1 else out[0]
Answered By: ryggyr

Answer #8:

Here is a fast vectorized version of @Dimitri’s solution if you have many values to search for (values can be multi-dimensional array):

#`values` should be sorted
def get_closest(array, values):
    #make sure array is a numpy array
    array = np.array(array)
    # get insert positions
    idxs = np.searchsorted(array, values, side="left")
    # find indexes where previous index is closer
    prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
    idxs[prev_idx_is_less] -= 1
    return array[idxs]

Benchmarks

> 100 times faster than using a for loop with @Demitri’s solution`

>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds
Answered By: anthonybell

Leave a Reply

Your email address will not be published.