# Removing nan values from an array

Posted on

### Question :

Removing nan values from an array

I want to figure out how to remove nan values from my array. My array looks something like this:

``````x = [1400, 1500, 1600, nan, nan, nan ,1700] #Not in this exact configuration
``````

How can I remove the `nan` values from `x`?

If you’re using numpy for your arrays, you can also use

``````x = x[numpy.logical_not(numpy.isnan(x))]
``````

Equivalently

``````x = x[~numpy.isnan(x)]
``````

[Thanks to chbrown for the added shorthand]

Explanation

The inner function, `numpy.isnan` returns a boolean/logical array which has the value `True` everywhere that `x` is not-a-number. As we want the opposite, we use the logical-not operator, `~` to get an array with `True`s everywhere that `x` is a valid number.

Lastly we use this logical array to index into the original array `x`, to retrieve just the non-NaN values.

``````filter(lambda v: v==v, x)
``````

works both for lists and numpy array
since v!=v only for NaN

Try this:

``````import math
print [value for value in x if not math.isnan(value)]
``````

For more, read on List Comprehensions.

For me the answer by @jmetz didn’t work, however using pandas isnull() did.

``````x = x[~pd.isnull(x)]
``````

Doing the above :

``````x = x[~numpy.isnan(x)]
``````

or

``````x = x[numpy.logical_not(numpy.isnan(x))]
``````

I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans.
e.g.

``````y = x[~numpy.isnan(x)]
``````

If you’re using `numpy`

``````# first get the indices where the values are finite
ii = np.isfinite(x)

# second get the values
x = x[ii]
``````

As shown by others

``````x[~numpy.isnan(x)]
``````

works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.

``````x[~pandas.isna(x)] or x[~pandas.isnull(x)]
``````

The accepted answer changes shape for 2d arrays.
I present a solution here, using the Pandas dropna() functionality.
It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing `np.nan`.

``````import pandas as pd
import numpy as np

def dropna(arr, *args, **kwarg):
assert isinstance(arr, np.ndarray)
dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
if arr.ndim==1:
dropped=dropped.flatten()
return dropped

x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )

print('='*20+' 1D Case: ' +'='*20+'nInput:n',x,sep='')
print('ndropna:n',dropna(x),sep='')

print('nn'+'='*20+' 2D Case: ' +'='*20+'nInput:n',y,sep='')
print('ndropna (rows):n',dropna(y),sep='')
print('ndropna (columns):n',dropna(y,axis=1),sep='')

print('nn'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'nInput:n',y,sep='')
print('ndropna:n',x[np.logical_not(np.isnan(x))],sep='')
``````

Result:

``````==================== 1D Case: ====================
Input:
[1400. 1500. 1600.   nan   nan   nan 1700.]

dropna:
[1400. 1500. 1600. 1700.]

==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
[  nan    0.   nan]
[1700. 1800.   nan]]

dropna (rows):
[[1400. 1500. 1600.]]

dropna (columns):
[[1500.]
[   0.]
[1800.]]

==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
[  nan    0.   nan]
[1700. 1800.   nan]]

dropna:
[1400. 1500. 1600. 1700.]
``````