# Find nearest value in numpy array

Posted on

Solving problem is about exposing yourself to as many situations as possible like Find nearest value in numpy array and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about Find nearest value in numpy array, which can be followed any time. Take easy to follow this discuss.

Find nearest value in numpy array

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

``````np.find_nearest( array, value )
``````

``````import numpy as np
def find_nearest(array, value):
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
array = np.random.random(10)
print(array)
# [ 0.21069679  0.61290182  0.63425412  0.84635244  0.91599191  0.00213826
#   0.17104965  0.56874386  0.57319379  0.28719469]
value = 0.5
print(find_nearest(array, value))
# 0.568743859261
``````

IF your array is sorted and is very large, this is a much faster solution:

``````def find_nearest(array,value):
idx = np.searchsorted(array, value, side="left")
if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) < math.fabs(value - array[idx])):
return array[idx-1]
else:
return array[idx]
``````

This scales to very large arrays. You can easily modify the above to sort in the method if you can’t assume that the array is already sorted. It’s overkill for small arrays, but once they get large this is much faster.

With slight modification, the answer above works with arrays of arbitrary dimension (1d, 2d, 3d, …):

``````def find_nearest(a, a0):
"Element in nd array `a` closest to the scalar value `a0`"
idx = np.abs(a - a0).argmin()
return a.flat[idx]
``````

Or, written as a single line:

``````a.flat[np.abs(a - a0).argmin()]
``````

Summary of answer: If one has a sorted `array` then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either.
If you have an unsorted `array` then if `array` is large, one should consider first using an O(n logn) sort and then bisection, and if `array` is small then method 2 seems the fastest.

First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first – note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

Bisection:

``````def bisection(array,value):
'''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
to indicate that ``value`` is out of range below and above respectively.'''
n = len(array)
if (value < array):
return -1
elif (value > array[n-1]):
return n
jl = 0# Initialize lower
ju = n-1# and upper limits.
while (ju-jl > 1):# If we are not yet done,
jm=(ju+jl) >> 1# compute a midpoint with a bitshift
if (value >= array[jm]):
jl=jm# and replace either the lower limit
else:
ju=jm# or the upper limit, as appropriate.
# Repeat until the test condition is satisfied.
if (value == array):# edge cases at bottom
return 0
elif (value == array[n-1]):# and top
return n-1
else:
return jl
``````

Now I’ll define the code from the other answers, they each return an index:

``````import math
import numpy as np
def find_nearest1(array,value):
idx,val = min(enumerate(array), key=lambda x: abs(x-value))
return idx
def find_nearest2(array, values):
indices = np.abs(np.subtract.outer(array, values)).argmin(0)
return indices
def find_nearest3(array, values):
values = np.atleast_1d(values)
indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
out = array[indices]
return indices
def find_nearest4(array,value):
idx = (np.abs(array-value)).argmin()
return idx
def find_nearest5(array, value):
idx_sorted = np.argsort(array)
sorted_array = np.array(array[idx_sorted])
idx = np.searchsorted(sorted_array, value, side="left")
if idx >= len(array):
idx_nearest = idx_sorted[len(array)-1]
elif idx == 0:
idx_nearest = idx_sorted
else:
if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
idx_nearest = idx_sorted[idx-1]
else:
idx_nearest = idx_sorted[idx]
return idx_nearest
def find_nearest6(array,value):
xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
return xi
``````

Now I’ll time the codes:
Note methods 1,2,4,5 don’t correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

``````array = np.arange(100000)
val = array+0.55
print( bisection(array,val))
%timeit bisection(array,val)
print( find_nearest1(array,val))
%timeit find_nearest1(array,val)
print( find_nearest2(array,val))
%timeit find_nearest2(array,val)
print( find_nearest3(array,val))
%timeit find_nearest3(array,val)
print( find_nearest4(array,val))
%timeit find_nearest4(array,val)
print( find_nearest5(array,val))
%timeit find_nearest5(array,val)
print( find_nearest6(array,val))
%timeit find_nearest6(array,val)
(50000, 50000)
100000 loops, best of 3: 4.4 µs per loop
50001
1 loop, best of 3: 180 ms per loop
50001
1000 loops, best of 3: 267 µs per loop

1000 loops, best of 3: 390 µs per loop
50001
1000 loops, best of 3: 259 µs per loop
50001
1000 loops, best of 3: 1.21 ms per loop

1000 loops, best of 3: 746 µs per loop
``````

For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 – 1000 times faster). For smaller arrays it’s ~2-100 times faster.

Here’s an extension to find the nearest vector in an array of vectors.

``````import numpy as np
def find_nearest_vector(array, value):
idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
return array[idx]
A = np.random.random((10,2))*100
""" A = array([[ 34.19762933,  43.14534123],
[ 48.79558706,  47.79243283],
[ 38.42774411,  84.87155478],
[ 63.64371943,  50.7722317 ],
[ 73.56362857,  27.87895698],
[ 96.67790593,  77.76150486],
[ 68.86202147,  21.38735169],
[  5.21796467,  59.17051276],
[ 82.92389467,  99.90387851],
[  6.76626539,  30.50661753]])"""
pt = [6, 30]
print find_nearest_vector(A,pt)
# array([  6.76626539,  30.50661753])
``````

If you don’t want to use numpy this will do it:

``````def find_nearest(array, value):
n = [abs(i-value) for i in array]
idx = n.index(min(n))
return array[idx]
``````

Here’s a version that will handle a non-scalar “values” array:

``````import numpy as np
def find_nearest(array, values):
indices = np.abs(np.subtract.outer(array, values)).argmin(0)
return array[indices]
``````

Or a version that returns a numeric type (e.g. int, float) if the input is scalar:

``````def find_nearest(array, values):
values = np.atleast_1d(values)
indices = np.abs(np.subtract.outer(array, values)).argmin(0)
out = array[indices]
return out if len(out) > 1 else out
``````

Here is a fast vectorized version of @Dimitri’s solution if you have many `values` to search for (`values` can be multi-dimensional array):

``````#`values` should be sorted
def get_closest(array, values):
#make sure array is a numpy array
array = np.array(array)
# get insert positions
idxs = np.searchsorted(array, values, side="left")
# find indexes where previous index is closer
prev_idx_is_less = ((idxs == len(array))|(np.fabs(values - array[np.maximum(idxs-1, 0)]) < np.fabs(values - array[np.minimum(idxs, len(array)-1)])))
idxs[prev_idx_is_less] -= 1
return array[idxs]
``````

Benchmarks

> 100 times faster than using a `for` loop with @Demitri’s solution`

``````>>> %timeit ar=get_closest(np.linspace(1, 1000, 100), np.random.randint(0, 1050, (1000, 1000)))
139 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit ar=[find_nearest(np.linspace(1, 1000, 100), value) for value in np.random.randint(0, 1050, 1000*1000)]
took 21.4 seconds
``````