# numpy: efficiently summing with index arrays

Posted on

### Question :

numpy: efficiently summing with index arrays

Suppose I have 2 matrices M and N (both have > 1 columns). I also have an index matrix I with 2 columns — 1 for M and one for N. The indices for N are unique, but the indices for M may appear more than once. The operation I would like to perform is,

``````for i,j in w:
M[i] += N[j]
``````

Is there a more efficient way to do this other than a for loop?

For completeness, in numpy >= 1.8 you can also use `np.add`‘s `at` method:

``````In : m, n = np.random.rand(2, 10)

In : m_idx, n_idx = np.random.randint(10, size=(2, 20))

In : m0 = m.copy()

In : m0 += np.bincount(m_idx, weights=n[n_idx], minlength=len(m))

In : np.allclose(m, m0)
Out: True

In : %timeit np.add.at(m, m_idx, n[n_idx])
100000 loops, best of 3: 9.49 us per loop

In : %timeit np.bincount(m_idx, weights=n[n_idx], minlength=len(m))
1000000 loops, best of 3: 1.54 us per loop
``````

Aside of the obvious performance disadvantage, it has a couple of advantages:

1. `np.bincount` converts its weights to double precision floats, `.at` will operate with you array’s native type. This makes it the simplest option for dealing e.g. with complex numbers.
2. `np.bincount` only adds weights together, you have an `at` method for all ufuncs, so you can repeatedly `multiply`, or `logical_and`, or whatever you feel like.

But for your use case, `np.bincount` is probably the way to go.

Using also `m_ind, n_ind = w.T`, just do `M += np.bincount(m_ind, weights=N[n_ind], minlength=len(M))`

For clarity, let’s define

``````>>> m_ind, n_ind = w.T
``````

Then the `for` loop

``````for i, j in zip(m_ind, n_ind):
M[i] += N[j]
``````

updates the entries `M[np.unique(m_ind)]`. The values that get written to it are `N[n_ind]`, which must be grouped by `m_ind`. (The fact that there’s an `n_ind` in addition to `m_ind` is actually tangential to the question; you could just set `N = N[n_ind]`.) There happens to be a SciPy class that does exactly this: `scipy.sparse.csr_matrix`.

Example data:

``````>>> m_ind, n_ind = array([[0, 0, 1, 1], [2, 3, 0, 1]])
>>> M = np.arange(2, 6)
>>> N = np.logspace(2, 5, 4)
``````

The result of the `for` loop is that `M` becomes `[110002 1103 4 5]`. We get the same result with a `csr_matrix` as follows. As I said earlier, `n_ind` isn’t relevant, so we get rid of that first.

``````>>> N = N[n_ind]
>>> from scipy.sparse import csr_matrix
>>> update = csr_matrix((N, m_ind, [0, len(N)])).toarray()
``````

The CSR constructor builds a matrix with the required values at the required indices; the third part of its argument is a compressed column index, meaning that the values `N[0:len(N)]` have the indices `m_ind[0:len(N)]`. Duplicates are summed:

``````>>> update
array([[ 110000.,    1100.]])
``````

This has shape `(1, len(np.unique(m_ind)))` and can be added in directly:

``````>>> M[np.unique(m_ind)] += update.ravel()
>>> M
array([110002,   1103,      4,      5])
``````