# Pandas groupby cumulative sum

Posted on

### Question :

Pandas groupby cumulative sum

I would like to add a cumulative sum column to my Pandas dataframe so that:

``````name | day       | no
-----|-----------|----
Jack | Monday    | 10
Jack | Tuesday   | 20
Jack | Tuesday   | 10
Jack | Wednesday | 50
Jill | Monday    | 40
Jill | Wednesday | 110
``````

becomes:

``````Jack | Monday     | 10  | 10
Jack | Tuesday    | 30  | 40
Jack | Wednesday  | 50  | 90
Jill | Monday     | 40  | 40
Jill | Wednesday  | 110 | 150
``````

I tried various combos of `df.groupby` and `df.agg(lambda x: cumsum(x))` to no avail.

This should do it, need `groupby()` twice:

``````df.groupby(['name', 'day']).sum()
.groupby(level=0).cumsum().reset_index()
``````

Explanation:

``````print(df)
name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

# sum per name/day
print( df.groupby(['name', 'day']).sum() )
no
name day
Jack Monday      10
Tuesday     30
Wednesday   50
Jill Monday      40
Wednesday  110

# cumulative sum per name/day
print( df.groupby(['name', 'day']).sum()
.groupby(level=0).cumsum() )
no
name day
Jack Monday      10
Tuesday     40
Wednesday   90
Jill Monday      40
Wednesday  150
``````

The dataframe resulting from the first sum is indexed by `'name'` and by `'day'`. You can see it by printing

``````df.groupby(['name', 'day']).sum().index
``````

When computing the cumulative sum, you want to do so by `'name'`, corresponding to the first index (level 0).

Finally, use `reset_index` to have the names repeated.

``````df.groupby(['name', 'day']).sum().groupby(level=0).cumsum().reset_index()

name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   40
2  Jack  Wednesday   90
3  Jill     Monday   40
4  Jill  Wednesday  150
``````

This works in pandas 0.16.2

``````In[23]: print df
name          day   no
0      Jack       Monday    10
1      Jack      Tuesday    20
2      Jack      Tuesday    10
3      Jack    Wednesday    50
4      Jill       Monday    40
5      Jill    Wednesday   110
In[24]: df['no_cumulative'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
In[25]: print df
name          day   no  no_cumulative
0      Jack       Monday    10             10
1      Jack      Tuesday    20             30
2      Jack      Tuesday    10             40
3      Jack    Wednesday    50             90
4      Jill       Monday    40             40
5      Jill    Wednesday   110            150
``````

Modification to @Dmitry’s answer. This is simpler and works in pandas 0.19.0:

``````print(df)

name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110

df['no_csum'] = df.groupby(['name'])['no'].cumsum()

print(df)
name        day   no  no_csum
0  Jack     Monday   10       10
1  Jack    Tuesday   20       30
2  Jack    Tuesday   10       40
3  Jack  Wednesday   50       90
4  Jill     Monday   40       40
5  Jill  Wednesday  110      150
``````

Instead of `df.groupby(by=['name','day']).sum().groupby(level=[0]).cumsum()`
(see above) you could also do a `df.set_index(['name', 'day']).groupby(level=0, as_index=False).cumsum()`

• `df.groupby(by=['name','day']).sum()` is actually just moving both columns to a MultiIndex
• `as_index=False` means you do not need to call reset_index afterwards

you should use

``````df['cum_no'] = df.no.cumsum()
``````

http://pandas.pydata.org/pandas-docs/version/0.19.2/generated/pandas.DataFrame.cumsum.html

Another way of doing it

``````import pandas as pd
df = pd.DataFrame({'C1' : ['a','a','a','b','b'],
'C2' : [1,2,3,4,5]})
df['cumsum'] = df.groupby(by=['C1'])['C2'].transform(lambda x: x.cumsum())
df
``````

data.csv:

``````name,day,no
Jack,Monday,10
Jack,Tuesday,20
Jack,Tuesday,10
Jack,Wednesday,50
Jill,Monday,40
Jill,Wednesday,110
``````

Code:

``````import numpy as np
import pandas as pd

print(df)
df = df.groupby(['name', 'day'])['no'].sum().reset_index()
print(df)
df['cumsum'] = df.groupby(['name'])['no'].apply(lambda x: x.cumsum())
print(df)
``````

Output:

``````   name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   20
2  Jack    Tuesday   10
3  Jack  Wednesday   50
4  Jill     Monday   40
5  Jill  Wednesday  110
name        day   no
0  Jack     Monday   10
1  Jack    Tuesday   30
2  Jack  Wednesday   50
3  Jill     Monday   40
4  Jill  Wednesday  110
name        day   no  cumsum
0  Jack     Monday   10      10
1  Jack    Tuesday   30      40
2  Jack  Wednesday   50      90
3  Jill     Monday   40      40
4  Jill  Wednesday  110     150
``````