Pandas concat failing

Posted on

Question :

Pandas concat failing

I am trying to concat dataframes based on the foll. 2 csv files:



Both of these have the same number and names of columns. However, when I do this:

pandas.concat([df_a, df_b])

I get the error:

AssertionError: Number of manager items must equal union of block items
# manager items: 20, # tot_items: 21

How to fix this?

Answer #1:

I believe that this error occurs if the following two conditions are met:

  1. The data frames have different columns. (i.e. (df1.columns == df2.columns) is False
  2. The columns has a repeated value.

Basically if you concat dataframes with columns [A,B,C] and [B,C,D] it can work out to make one series for each distinct column name. So if I try to join a third dataframe [B,B,C] it does not know which column to append and ends up with fewer distinct columns than it thinks it needs.

If your dataframes are such that df1.columns == df2.columns then it will work anyway. So you can join [B,B,C] to [B,B,C], but not to [C,B,B], as if the columns are identical it probably just uses the integer indexes or something.

Answered By: phil_20686

Answer #2:

You can get around this issue with a ‘manual’ concatenation, in this case your

list_of_dfs = [df_a, df_b]

And instead of running

giant_concat_df = pd.concat(list_of_dfs,0)

You can use turn all of the dataframes to a list of dictionaries and then make a new data frame from these lists (merged with chain)

from itertools import chain
list_of_dicts = [cur_df.T.to_dict().values() for cur_df in list_of_dfs]    
giant_concat_df = pd.DataFrame(list(chain(*list_of_dicts)))
Answered By: kmader

Answer #3:

The answers here did not solve my issue, but this answer did.

The Issue was duplicated columns in one or both DataFrames.

Here’s a duplicated column fix(as per answer above):

df = df.loc[:,~df.columns.duplicated()]
Answered By: Ukrainian-serge

Answer #4:

Unfortunately, the source files are already unavailable, so I can’t check my solution in your case. In my case the error occurred when:

  1. Data frames have two columns with the same name (I’ve had ID and id columns, which I then converted to lower case, so they become the same)
  2. Value types of the same-named columns are different

Here is an example which gives me the error in question:

df1 = pd.DataFrame(data=[
    ['a', 'b', 'id', 1],
    ['a', 'b', 'id', 2]
], columns=['A', 'B', 'id', 'id'])

df2 = pd.DataFrame(data=[
    ['b', 'c', 'id', 1],
    ['b', 'c', 'id', 2]
], columns=['B', 'C', 'id', 'id'])
pd.concat([df1, df2])
>>> AssertionError: Number of manager items must equal union of block items
 # manager items: 4, # tot_items: 5

Removing / renaming one of the columns makes this code work.

Answered By: Karatheodory

Leave a Reply

Your email address will not be published. Required fields are marked *