### Question :

Given a data frame that looks like this

```
GROUP VALUE
1 5
2 2
1 10
2 20
1 7
```

I would like to compute the difference between the largest and smallest value within each group. That is, the result should be

```
GROUP DIFF
1 5
2 18
```

What is an easy way to do this in Pandas?

What is a fast way to do this in Pandas for a data frame with about 2 million rows and 1 million groups?

##
Answer #1:

Using @unutbu ‘s `df`

*per timing*

unutbu’s solution is best over large data sets

```
import pandas as pd
import numpy as np
df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
df.groupby('GROUP')['VALUE'].agg(np.ptp)
GROUP
1 5
2 18
Name: VALUE, dtype: int64
```

`np.ptp`

docs returns the range of an array

*timing*

**small df**

**large df**

`df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))`

*large df*

*many groups*`df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))`

##
Answer #2:

`groupby/agg`

generally performs best when you take advantage of the built-in aggregators such as `'max'`

and `'min'`

. So to obtain the difference, first compute the `max`

and `min`

and then subtract:

```
import pandas as pd
df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
result = df.groupby('GROUP')['VALUE'].agg(['max','min'])
result['diff'] = result['max']-result['min']
print(result[['diff']])
```

yields

```
diff
GROUP
1 5
2 18
```

##
Answer #3:

You can use `groupby()`

, `min()`

, and `max()`

:

```
df.groupby('GROUP')['VALUE'].apply(lambda g: g.max() - g.min())
```