# Numpy Pure Functions for performance, caching

Posted on

### Question :

Numpy Pure Functions for performance, caching

I’m writing some moderately performance critical code in numpy.
This code will be in the inner most loop, of a computation that’s run time is measured in hours.
A quick calculation suggest that this code will be executed up something like 10^12 times, in some variations of the calculation.

So the function is to calculate sigmoid(X) and another to calculate its derivative (gradient).
Sigmoid has the property that for
y=sigmoid(x), dy/dx= y(1-y)
In python for numpy this looks like:

``````sigmoid = vectorize(lambda(x): 1.0/(1.0+exp(-x)))
``````

As can be seen, both functions are pure (without side effects),
so they are ideal candidates for memoization,
at least for the short term, I have some worries about caching every single call to sigmoid ever made: Storing 10^12 floats which would take several terabytes of RAM.

Is there a good way to optimise this?
Will python pick up that these are pure functions and cache them for me, as appropriate?
Am I worrying over nothing?

These functions already exist in scipy. The sigmoid function is available as `scipy.special.expit`.

``````In [36]: from scipy.special import expit
``````

Compare `expit` to the vectorized sigmoid function:

``````In [38]: x = np.linspace(-6, 6, 1001)

In [39]: %timeit y = sigmoid(x)
100 loops, best of 3: 2.4 ms per loop

In [40]: %timeit y = expit(x)
10000 loops, best of 3: 20.6 µs per loop
``````

`expit` is also faster than implementing the formula yourself:

``````In [41]: %timeit y = 1.0 / (1.0 + np.exp(-x))
10000 loops, best of 3: 27 µs per loop
``````

The CDF of the logistic distribution is the sigmoid function. It is available as the `cdf` method of `scipy.stats.logistic`, but `cdf` eventually calls `expit`, so there is no point in using that method. You can use the `pdf` method to compute the derivative of the sigmoid function, or the `_pdf` method which has less overhead, but “rolling your own” is faster:

``````In [44]: def sigmoid_grad(x):
....:     ex = np.exp(-x)
....:     y = ex / (1 + ex)**2
....:     return y
``````

Timing (x has length 1001):

``````In [45]: from scipy.stats import logistic

In [46]: %timeit y = logistic._pdf(x)
10000 loops, best of 3: 73.8 µs per loop

In [47]: %timeit y = sigmoid_grad(x)
10000 loops, best of 3: 29.7 µs per loop
``````

Be careful with your implementation if you are going to use values that are far into the tails. The exponential function can overflow pretty easily. `logistic._cdf` is a bit more robust than my quick implementation of `sigmoid_grad`:

``````In [60]: sigmoid_grad(-500)
/home/warren/anaconda/bin/ipython:3: RuntimeWarning: overflow encountered in double_scalars
import sys
Out[60]: 0.0

In [61]: logistic._pdf(-500)
Out[61]: 7.1245764067412855e-218
``````

An implementation using `sech**2` (`1/cosh**2`) is a bit slower than the above `sigmoid_grad`:

``````In [101]: def sigmoid_grad_sech2(x):
.....:     y = (0.5 / np.cosh(0.5*x))**2
.....:     return y
.....:

In [102]: %timeit y = sigmoid_grad_sech2(x)
10000 loops, best of 3: 34 µs per loop
``````

But it handles the tails better:

``````In [103]: sigmoid_grad_sech2(-500)
Out[103]: 7.1245764067412855e-218

Out[104]: 7.1245764067412855e-218
``````

Just expanding on my comment, here is a comparison between your sigmoid through `vectorize` and using numpy directly:

``````In [1]: x = np.random.normal(size=10000)

In [2]: sigmoid = np.vectorize(lambda x: 1.0 / (1.0 + np.exp(-x)))

In [3]: %timeit sigmoid(x)
10 loops, best of 3: 63.3 ms per loop

In [4]: %timeit 1.0 / (1.0 + np.exp(-x))
1000 loops, best of 3: 250 us per loop
``````

As you can see, not only does `vectorize` make it much slower, the fact is that you can calculate 10000 sigmoids in 250 microseconds (that is, 25 nanoseconds for each). A single dictionary look-up in Python is slower than that, let alone all the other code to get the memoization in place.

The only way to optimize this that I can think of is writing a sigmoid ufunc for numpy, which basically will implement the operation in C. That way, you won’t have to do each operation in the sigmoid to the entire array, even though numpy does this really fast.

If you are looking to memoize this process, I’d wrap that code in a function and decorate with `functools.lru_cache(maxsize=n)`. Experiment with the `maxsize` value to find the appropriate size for your application. For best results, use a `maxsize` argument that is a power of two.

``````from functools import lru_cache

lru_cache(maxsize=8096)
def sigmoids(x):
sigmoid = vectorize(lambda(x): 1.0/(1.0+exp(-x)))
``````

If you’re on 2.7 (which I expect you are since you’re using numpy), you can take a look at https://pypi.python.org/pypi/repoze.lru/ for a memoization library with identical syntax.

You can install it via pip: `pip install repoze.lru`

``````from repoze.lru import lru_cache

lru_cache(maxsize=8096)
def sigmoids(x):
sigmoid = vectorize(lambda(x): 1.0/(1.0+exp(-x)))
``````

Mostly I agree with Warren Weckesser and his answer above.
But for derivative of sigmoid the following can be used:

``````In [002]: def sg(x):
...: s = scipy.special.expit(x)
...: return s * (1.0 - s)
``````

Timings:

``````In [003]: %timeit y = logistic._pdf(x)
10000 loops, best of 3: 45 µs per loop

In [004]: %timeit y = sg(x)
10000 loops, best of 3: 20.4 µs per loop
``````

The only problem is accuracy:

``````In [005]: sg(37)
Out[005]: 0.0

In [006]: logistic._pdf(37)
Out[006]: 8.5330476257440658e-17
``````