### Question :

I have the following data frame in IPython, where each row is a single stock:

```
In [261]: bdata
Out[261]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 21210 entries, 0 to 21209
Data columns:
BloombergTicker 21206 non-null values
Company 21210 non-null values
Country 21210 non-null values
MarketCap 21210 non-null values
PriceReturn 21210 non-null values
SEDOL 21210 non-null values
yearmonth 21210 non-null values
dtypes: float64(2), int64(1), object(4)
```

I want to apply a groupby operation that computes cap-weighted average return across everything, per each date in the “yearmonth” column.

This works as expected:

```
In [262]: bdata.groupby("yearmonth").apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
Out[262]:
yearmonth
201204 -0.109444
201205 -0.290546
```

But then I want to sort of “broadcast” these values back to the indices in the original data frame, and save them as constant columns where the dates match.

```
In [263]: dateGrps = bdata.groupby("yearmonth")
In [264]: dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/mnt/bos-devrnd04/usr6/home/espears/ws/Research/Projects/python-util/src/util/<ipython-input-264-4a68c8782426> in <module>()
----> 1 dateGrps["MarketReturn"] = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
TypeError: 'DataFrameGroupBy' object does not support item assignment
```

I realize this naive assignment should not work. But what is the “right” Pandas idiom for assigning the result of a groupby operation into a new column on the parent dataframe?

In the end, I want a column called “MarketReturn” than will be a repeated constant value for all indices that have matching date with the output of the groupby operation.

One hack to achieve this would be the following:

```
marketRetsByDate = dateGrps.apply(lambda x: (x["PriceReturn"]*x["MarketCap"]/x["MarketCap"].sum()).sum())
bdata["MarketReturn"] = np.repeat(np.NaN, len(bdata))
for elem in marketRetsByDate.index.values:
bdata["MarketReturn"][bdata["yearmonth"]==elem] = marketRetsByDate.ix[elem]
```

But this is slow, bad, and unPythonic.

##
Answer #1:

```
In [97]: df = pandas.DataFrame({'month': np.random.randint(0,11, 100), 'A': np.random.randn(100), 'B': np.random.randn(100)})
In [98]: df.join(df.groupby('month')['A'].sum(), on='month', rsuffix='_r')
Out[98]:
A B month A_r
0 -0.040710 0.182269 0 -0.331816
1 -0.004867 0.642243 1 2.448232
2 -0.162191 0.442338 4 2.045909
3 -0.979875 1.367018 5 -2.736399
4 -1.126198 0.338946 5 -2.736399
5 -0.992209 -1.343258 1 2.448232
6 -1.450310 0.021290 0 -0.331816
7 -0.675345 -1.359915 9 2.722156
```

##
Answer #2:

While I’m still exploring all of the incredibly smart ways that `apply`

concatenates the pieces it’s given, here’s another way to add a new column in the parent after a groupby operation.

```
In [236]: df
Out[236]:
yearmonth return
0 201202 0.922132
1 201202 0.220270
2 201202 0.228856
3 201203 0.277170
4 201203 0.747347
In [237]: def add_mkt_return(grp):
.....: grp['mkt_return'] = grp['return'].sum()
.....: return grp
.....:
In [238]: df.groupby('yearmonth').apply(add_mkt_return)
Out[238]:
yearmonth return mkt_return
0 201202 0.922132 1.371258
1 201202 0.220270 1.371258
2 201202 0.228856 1.371258
3 201203 0.277170 1.024516
4 201203 0.747347 1.024516
```

##
Answer #3:

As a general rule when using groupby(), if you use the .transform() function pandas will return a table with the same length as your original. When you use other functions like .sum() or .first() then pandas will return a table where each row is a group.

I’m not sure how this works with apply but implementing elaborate lambda functions with transform can be fairly tricky so the strategy that I find most helpful is to create the variables I need, place them in the original dataset and then do my operations there.

If I understand what you’re trying to do correctly first you can calculate the total market cap for each group:

```
bdata['group_MarketCap'] = bdata.groupby('yearmonth')['MarketCap'].transform('sum')
```

This will add a column called “group_MarketCap” to your original data which would contain the sum of market caps for each group. Then you can calculate the weighted values directly:

```
bdata['weighted_P'] = bdata['PriceReturn'] * (bdata['MarketCap']/bdata['group_MarketCap'])
```

And finally you would calculate the weighted average for each group using the same transform function:

```
bdata['MarketReturn'] = bdata.groupby('yearmonth')['weighted_P'].transform('sum')
```

I tend to build my variables this way. Sometimes you can pull off putting it all in a single command but that doesn’t always work with groupby() because most of the time pandas needs to instantiate the new object to operate on it at the full dataset scale (i.e. you can’t add two columns together if one doesn’t exist yet).

Hope this helps đź™‚

##
Answer #4:

May I suggest the `transform`

method (instead of aggregate)? If you use it in your original example it should do what you want (the broadcasting).

##
Answer #5:

I did not find a way to make assignment to the original dataframe. So I just store the results from the groups and concatenate them. Then we sort the concatenated dataframe by index to get the original order as the input dataframe. Here is a sample code:

```
In [10]: df = pd.DataFrame({'month': np.random.randint(0,11, 100), 'A': np.random.randn(100), 'B': np.random.randn(100)})
In [11]: df.head()
Out[11]:
month A B
0 4 -0.029106 -0.904648
1 2 -2.724073 0.492751
2 7 0.732403 0.689530
3 2 0.487685 -1.017337
4 1 1.160858 -0.025232
In [12]: res = []
In [13]: for month, group in df.groupby('month'):
...: new_df = pd.DataFrame({
...: 'A^2+B': group.A ** 2 + group.B,
...: 'A+B^2': group.A + group.B**2
...: })
...: res.append(new_df)
...:
In [14]: res = pd.concat(res).sort_index()
In [15]: res.head()
Out[15]:
A^2+B A+B^2
0 -0.903801 0.789282
1 7.913327 -2.481270
2 1.225944 1.207855
3 -0.779501 1.522660
4 1.322360 1.161495
```

This method is pretty fast and extensible. You can derive any feature here.