### Question :

Is there any function in numpy to group this array down below by the first column?

I couldn’t find any good answer over the internet..

```
>>> a
array([[ 1, 275],
[ 1, 441],
[ 1, 494],
[ 1, 593],
[ 2, 679],
[ 2, 533],
[ 2, 686],
[ 3, 559],
[ 3, 219],
[ 3, 455],
[ 4, 605],
[ 4, 468],
[ 4, 692],
[ 4, 613]])
```

Wanted output:

```
array([[[275, 441, 494, 593]],
[[679, 533, 686]],
[[559, 219, 455]],
[[605, 468, 692, 613]]], dtype=object)
```

##
Answer #1:

Inspired by Eelco Hoogendoorn’s library, but without his library, and using the fact that the first column of your array is always increasing (if not, sort first with inplace `a.sort(axis=0)`

)

```
>>> np.split(a[:,1], np.unique(a[:, 0], return_index=True)[1][1:])
[array([275, 441, 494, 593]),
array([679, 533, 686]),
array([559, 219, 455]),
array([605, 468, 692, 613])]
```

I didn’t “timeit” but this is probably the faster way to achieve the question :

- No python native loop
- Result lists are numpy arrays, in case you need to make other numpy operations on them, no new conversion will be needed
- Complexity like O(n)

[EDIT] I improved the answer thanks to

ns63sr

##
Answer #2:

The numpy_indexed package (disclaimer: I am its author) aims to fill this gap in numpy. All operations in numpy-indexed are fully vectorized, and no O(n^2) algorithms were harmed during the making of this library.

```
import numpy_indexed as npi
npi.group_by(a[:, 0]).split(a[:, 1])
```

Note that it is usually more efficient to directly compute relevant properties over such groups (ie, group_by(keys).mean(values)), rather than first splitting into a list / jagged array.

##
Answer #3:

Numpy is not very handy here because the desired output is not an array of integers (it is an array of list objects).

I suggest either the pure Python way…

```
from collections import defaultdict
%%timeit
d = defaultdict(list)
for key, val in a:
d[key].append(val)
10.7 µs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# result:
defaultdict(list,
{1: [275, 441, 494, 593],
2: [679, 533, 686],
3: [559, 219, 455],
4: [605, 468, 692, 613]})
```

…or the pandas way:

```
import pandas as pd
%%timeit
df = pd.DataFrame(a, columns=["key", "val"])
df.groupby("key").val.apply(pd.Series.tolist)
979 µs ± 3.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# result:
key
1 [275, 441, 494, 593]
2 [679, 533, 686]
3 [559, 219, 455]
4 [605, 468, 692, 613]
Name: val, dtype: object
```

##
Answer #4:

```
n = np.unique(a[:,0])
np.array( [ list(a[a[:,0]==i,1]) for i in n] )
```

outputs:

```
array([[275, 441, 494, 593], [679, 533, 686], [559, 219, 455],
[605, 468, 692, 613]], dtype=object)
```

##
Answer #5:

Simplifying the answer of Vincent J and considering the comment of HS-nebula one can use `return_index = True`

instead of `return_counts = True`

and get rid of the `cumsum`

:

```
np.split(a[:,1], np.unique(a[:,0], return_index = True)[1])[1:]
```

Output

```
[array([275, 441, 494, 593]),
array([679, 533, 686]),
array([559, 219, 455]),
array([605, 468, 692, 613])]
```

##
Answer #6:

I used np.unique() followed by np.extract()

```
unique = np.unique(a[:, 0:1])
answer = []
for element in unique:
present = a[:,0]==element
answer.append(np.extract(present,a[:,-1]))
print (answer)
```

`[array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]`

##
Answer #7:

given X as array of items you want to be grouped and y (1D array) as corresponding groups, following function does the grouping with *numpy*:

```
def groupby(X, y):
y = np.asarray(y)
X = np.asarray(X)
y_uniques = np.unique(y)
return [X[y==yi] for yi in y_uniques]
```

So, `groupby(a[:,1], a[:,0])`

returns

`[array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]`

##
Answer #8:

We might also find it useful to generate a `dict`

:

```
def groupby(X):
X = np.asarray(X)
x_uniques = np.unique(X)
return {xi:X[X==xi] for xi in x_uniques}
```

Let’s try it out:

```
X=[1,1,2,2,3,3,3,3,4,5,6,7,7,8,9,9,1,1,1]
groupby(X)
Out[9]:
{1: array([1, 1, 1, 1, 1]),
2: array([2, 2]),
3: array([3, 3, 3, 3]),
4: array([4]),
5: array([5]),
6: array([6]),
7: array([7, 7]),
8: array([8]),
9: array([9, 9])}
```

Note this by itself is not super compelling – but if we make `X`

an `object`

or `namedtuple`

and then provide a `groupby`

function it becomes more interesting. Will put that in later.