# Pandas: Difference between largest and smallest value within group

Posted on

### Question :

Pandas: Difference between largest and smallest value within group

Given a data frame that looks like this

``````GROUP VALUE
1     5
2     2
1     10
2     20
1     7
``````

I would like to compute the difference between the largest and smallest value within each group. That is, the result should be

``````GROUP   DIFF
1      5
2      18
``````

What is an easy way to do this in Pandas?

What is a fast way to do this in Pandas for a data frame with about 2 million rows and 1 million groups?

Using @unutbu ‘s `df`

per timing
unutbu’s solution is best over large data sets

``````import pandas as pd
import numpy as np

df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})

df.groupby('GROUP')['VALUE'].agg(np.ptp)

GROUP
1     5
2    18
Name: VALUE, dtype: int64
``````

`np.ptp` docs returns the range of an array

timing
small `df`

large `df`
`df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 100, VALUE=np.random.rand(1000000)))`

large `df`
many groups
`df = pd.DataFrame(dict(GROUP=np.arange(1000000) % 10000, VALUE=np.random.rand(1000000)))`

`groupby/agg` generally performs best when you take advantage of the built-in aggregators such as `'max'` and `'min'`. So to obtain the difference, first compute the `max` and `min` and then subtract:

``````import pandas as pd
df = pd.DataFrame({'GROUP': [1, 2, 1, 2, 1], 'VALUE': [5, 2, 10, 20, 7]})
result = df.groupby('GROUP')['VALUE'].agg(['max','min'])
result['diff'] = result['max']-result['min']
print(result[['diff']])
``````

yields

``````       diff
GROUP
1         5
2        18
``````

You can use `groupby()`, `min()`, and `max()`:
``````df.groupby('GROUP')['VALUE'].apply(lambda g: g.max() - g.min())