### Question :

I have a dataframe df, with two columns, I want to groupby one column and join the lists belongs to same group, example:

```
column_a, column_b
1, [1,2,3]
1, [2,5]
2, [5,6]
```

after the process:

```
column_a, column_b
1, [1,2,3,2,5]
2, [5,6]
```

I want to keep all the duplicates. I have the following questions:

- The dtypes of the dataframe are object(s). convert_objects() doesn’t convert column_b to list automatically. How can I do this?
- what does the function in df.groupby(…).apply(lambda x: …) apply to ? what is the form of x ? list?
- the solution to my main problem?

Thanks in advance.

##
Answer #1:

`object`

dtype is a catch-all dtype that basically means not int, float, bool, datetime, or timedelta. So it is storing them as a list. `convert_objects`

tries to convert a column to one of those dtypes.

You want

```
In [63]: df
Out[63]:
a b c
0 1 [1, 2, 3] foo
1 1 [2, 5] bar
2 2 [5, 6] baz
In [64]: df.groupby('a').agg({'b': 'sum', 'c': lambda x: ' '.join(x)})
Out[64]:
c b
a
1 foo bar [1, 2, 3, 2, 5]
2 baz [5, 6]
```

This groups the data frame by the values in column `a`

. Read more about [groupby].(http://pandas.pydata.org/pandas-docs/stable/groupby.html).

This is doing a regular list `sum`

(concatenation) just like `[1, 2, 3] + [2, 5]`

##
Answer #2:

```
df.groupby('column_a').agg(sum)
```

This works because of operator overloading `sum`

concatenates the lists together. The index of the resulting df will be the values from `column_a`

:

##
Answer #3:

Use numpy and simple “for” or “map”:

`import numpy as np`

```
u_clm = np.unique(df.column_a.values)
all_lists = []
for clm in u_clm:
df_process = df.query('column_a == @clm')
list_ = np.concatenate(df.column_b.values)
all_lists.append((clm, list_.tolist()))
```

`df_sum_lists = pd.DataFrame(all_lists)`

It’s faster in 350 times than a simple “groupby-agg-sum” approach for huge datasets.