# Diff of two Dataframes

Posted on

### Question :

Diff of two Dataframes

I need to compare two dataframes of different size row-wise and print out non matching rows. Lets take the following two:

``````df1 = DataFrame({
'Quantity': [18, 3, 5, ]})

df2 = DataFrame({
'Quantity': [2, 1, 18, 5]})
``````

What is the most efficient way to row-wise over df2 and print out rows not in df1 e.g.

``````Buyer     Quantity
Carl         2
Mark         1
``````

Important: I do not want to have row:

``````Buyer     Quantity
Carl         3
``````

Included in the diff:

But these do not match with my problem.

`merge` the 2 dfs using method ‘outer’ and pass param `indicator=True` this will tell you whether the rows are present in both/left only/right only, you can then filter the merged df after:

``````In :
merged = df1.merge(df2, indicator=True, how='outer')
merged[merged['_merge'] == 'right_only']

Out:
3  Carl         2  right_only
4  Mark         1  right_only
``````

you may find this as the best:

``````df2[ ~df2.isin(df1)].dropna()
``````

``````diff = set(zip(df2.Buyer, df2.Quantity)) - set(zip(df1.Buyer, df1.Quantity))
``````

This is the first solution that came to mind. You can then put the diff set back in a DF for presentation.

@EdChum’s answer is self-explained. But using `not 'both'` condition makes more sense and you do not need to care about the order of comparison, and this is what a real diff supposed to be. For the sake of answering your question:

``````merged = df1.merge(df2, indicator=True, how='outer')
merged.loc = [merged['_merge'] != 'both']
``````

``````df_delta=df2[df2['Buyer'].apply(lambda x: x not in df1['Buyer'].values)]
``````df1.compare(df2)