I have the following code:
r = numpy.zeros(shape = (width, height, 9))
It creates a
width x height x 9 matrix filled with zeros. Instead, I’d like to know if there’s a function or way to initialize them instead to
NaNs in an easy way.
You rarely need loops for vector operations in numpy.
You can create an uninitialized array and assign to all entries at once:
3,3,)) a[:] = numpy.nan a array([[ NaN, NaN, NaN], [ NaN, NaN, NaN], [ NaN, NaN, NaN]])a = numpy.empty((
I have timed the alternatives
a[:] = numpy.nan here and
a.fill(numpy.nan) as posted by Blaenk:
$ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a.fill(np.nan)" 10000 loops, best of 3: 54.3 usec per loop $ python -mtimeit "import numpy as np; a = np.empty((100,100));" "a[:] = np.nan" 10000 loops, best of 3: 88.8 usec per loop
The timings show a preference for
ndarray.fill(..) as the faster alternative. OTOH, I like numpy’s convenience implementation where you can assign values to whole slices at the time, the code’s intention is very clear.
ndarray.fill performs its operation in-place, so
numpy.empty((3,3,)).fill(numpy.nan) will instead return
Another option is to use
numpy.full, an option available in NumPy 1.8+
a = np.full([height, width, 9], np.nan)
This is pretty flexible and you can fill it with any other number that you want.
I compared the suggested alternatives for speed and found that, for large enough vectors/matrices to fill, all alternatives except
val * ones and
array(n * [val]) are equally fast.
Code to reproduce the plot:
import numpy import perfplot val = 42.0 def fill(n): a = numpy.empty(n) a.fill(val) return a def colon(n): a = numpy.empty(n) a[:] = val return a def full(n): return numpy.full(n, val) def ones_times(n): return val * numpy.ones(n) def list(n): return numpy.array(n * [val]) perfplot.show( setup=lambda n: n, kernels=[fill, colon, full, ones_times, list], n_range=[2 ** k for k in range(20)], logx=True, logy=True, xlabel="len(a)", )
Are you familiar with
You can create your own method such as:
def nans(shape, dtype=float): a = numpy.empty(shape, dtype) a.fill(numpy.nan) return a
array([[ NaN, NaN, NaN, NaN], [ NaN, NaN, NaN, NaN], [ NaN, NaN, NaN, NaN]])
I found this code in a mailing list thread.
You can always use multiplication if you don’t immediately recall the
3,2)) array([[ nan, nan], [ nan, nan], [ nan, nan]])np.nan * np.ones(shape=(
Of course it works with any other numerical value as well:
42 * np.ones(shape=(3,2)) array([[ 42, 42], [ 42, 42], [ 42, 42]])
But the @u0b34a0f6ae’s accepted answer is 3x faster (CPU cycles, not brain cycles to remember numpy syntax ;):
$ python -mtimeit "import numpy as np; X = np.empty((100,100));" "X[:] = np.nan;" 100000 loops, best of 3: 8.9 usec per loop (predict)laneh@predict:~/src/predict/predict/webapp$ master $ python -mtimeit "import numpy as np; X = np.ones((100,100));" "X *= np.nan;" 10000 loops, best of 3: 24.9 usec per loop
As said, numpy.empty() is the way to go. However, for objects, fill() might not do exactly what you think it does:
In: a = numpy.empty(5,dtype=object) In: a.fill() In: a Out: array([, , , , ], dtype=object) In: a.append(4) In: a Out: array([, , , , ], dtype=object)
One way around can be e.g.:
In: a = numpy.empty(5,dtype=object) In: a[:]= [  for x in range(5)] In: a.append(4) In: a Out: array([, , , , ], dtype=object)
Another alternative is
numpy.broadcast_to(val,n) which returns in constant time regardless of the size and is also the most memory efficient (it returns a view of the repeated element). The caveat is that the returned value is read-only.
Below is a comparison of the performances of all the other methods that have been proposed using the same benchmark as in Nico Schlömer’s answer.
Yet another possibility not yet mentioned here is to use NumPy tile:
a = numpy.tile(numpy.nan, (3, 3))
array([[ NaN, NaN, NaN], [ NaN, NaN, NaN], [ NaN, NaN, NaN]])
I don’t know about speed comparison.