# Create numpy matrix filled with NaNs

Posted on

### Question :

Create numpy matrix filled with NaNs

I have the following code:

``````r = numpy.zeros(shape = (width, height, 9))
``````

It creates a `width x height x 9` matrix filled with zeros. Instead, I’d like to know if there’s a function or way to initialize them instead to `NaN`s in an easy way.

You rarely need loops for vector operations in numpy.
You can create an uninitialized array and assign to all entries at once:

``````>>> a = numpy.empty((3,3,))
>>> a[:] = numpy.nan
>>> a
array([[ NaN,  NaN,  NaN],
[ NaN,  NaN,  NaN],
[ NaN,  NaN,  NaN]])
``````

I have timed the alternatives `a[:] = numpy.nan` here and `a.fill(numpy.nan)` as posted by Blaenk:

``````\$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)"
10000 loops, best of 3: 54.3 usec per loop
\$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan"
10000 loops, best of 3: 88.8 usec per loop
``````

The timings show a preference for `ndarray.fill(..)` as the faster alternative. OTOH, I like numpy’s convenience implementation where you can assign values to whole slices at the time, the code’s intention is very clear.

Note that `ndarray.fill` performs its operation in-place, so `numpy.empty((3,3,)).fill(numpy.nan)` will instead return `None`.

Another option is to use `numpy.full`, an option available in NumPy 1.8+

``````a = np.full([height, width, 9], np.nan)
``````

This is pretty flexible and you can fill it with any other number that you want.

I compared the suggested alternatives for speed and found that, for large enough vectors/matrices to fill, all alternatives except `val * ones` and `array(n * [val])` are equally fast.

Code to reproduce the plot:

``````import numpy
import perfplot

val = 42.0

def fill(n):
a = numpy.empty(n)
a.fill(val)
return a

def colon(n):
a = numpy.empty(n)
a[:] = val
return a

def full(n):
return numpy.full(n, val)

def ones_times(n):
return val * numpy.ones(n)

def list(n):
return numpy.array(n * [val])

perfplot.show(
setup=lambda n: n,
kernels=[fill, colon, full, ones_times, list],
n_range=[2 ** k for k in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
)
``````

Are you familiar with `numpy.nan`?

You can create your own method such as:

``````def nans(shape, dtype=float):
a = numpy.empty(shape, dtype)
a.fill(numpy.nan)
return a
``````

Then

``````nans([3,4])
``````

would output

``````array([[ NaN,  NaN,  NaN,  NaN],
[ NaN,  NaN,  NaN,  NaN],
[ NaN,  NaN,  NaN,  NaN]])
``````

I found this code in a mailing list thread.

You can always use multiplication if you don’t immediately recall the `.empty` or `.full` methods:

``````>>> np.nan * np.ones(shape=(3,2))
array([[ nan,  nan],
[ nan,  nan],
[ nan,  nan]])
``````

Of course it works with any other numerical value as well:

``````>>> 42 * np.ones(shape=(3,2))
array([[ 42,  42],
[ 42,  42],
[ 42, 42]])
``````

But the @u0b34a0f6ae’s accepted answer is 3x faster (CPU cycles, not brain cycles to remember numpy syntax ;):

``````\$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;"
100000 loops, best of 3: 8.9 usec per loop
(predict)laneh@predict:~/src/predict/predict/webapp\$ master
\$ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;"
10000 loops, best of 3: 24.9 usec per loop
``````

As said, numpy.empty() is the way to go. However, for objects, fill() might not do exactly what you think it does:

``````In[36]: a = numpy.empty(5,dtype=object)
In[37]: a.fill([])
In[38]: a
Out[38]: array([[], [], [], [], []], dtype=object)
In[39]: a[0].append(4)
In[40]: a
Out[40]: array([[4], [4], [4], [4], [4]], dtype=object)
``````

One way around can be e.g.:

``````In[41]: a = numpy.empty(5,dtype=object)
In[42]: a[:]= [ [] for x in range(5)]
In[43]: a[0].append(4)
In[44]: a
Out[44]: array([[4], [], [], [], []], dtype=object)
``````

Another alternative is `numpy.broadcast_to(val,n)` which returns in constant time regardless of the size and is also the most memory efficient (it returns a view of the repeated element). The caveat is that the returned value is read-only.

Below is a comparison of the performances of all the other methods that have been proposed using the same benchmark as in Nico Schlömer’s answer.

Yet another possibility not yet mentioned here is to use NumPy tile:

``````a = numpy.tile(numpy.nan, (3, 3))
``````

Also gives

``````array([[ NaN,  NaN,  NaN],
[ NaN,  NaN,  NaN],
[ NaN,  NaN,  NaN]])
``````

I don’t know about speed comparison.