# How to drop unique rows in a pandas dataframe?

Posted on

### Question :

How to drop unique rows in a pandas dataframe?

I am stuck with a seemingly easy problem: dropping unique rows in a pandas dataframe. Basically, the opposite of `drop_duplicates()`.

Let’s say this is my data:

``````    A       B   C
0   foo     0   A
1   foo     1   A
2   foo     1   B
3   bar     1   A
``````

I would like to drop the rows when A, and B are unique, i.e. I would like to keep only the rows 1 and 2.

I tried the following:

``````# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates()
duplicates = df[~df.index.isin(uniques.index)]
``````

But I only get the row 2, as 0, 1, and 3 are in the uniques!

Solutions for select all duplicated rows:

You can use `duplicated` with subset and parameter `keep=False` for select all duplicates:

``````df = df[df.duplicated(subset=['A','B'], keep=False)]
print (df)
A  B  C
1  foo  1  A
2  foo  1  B
``````

Solution with `transform`:

``````df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1]
print (df)
A  B  C
1  foo  1  A
2  foo  1  B
``````

A bit modified solutions for select all unique rows:

``````#invert boolean mask by ~
df = df[~df.duplicated(subset=['A','B'], keep=False)]
print (df)
A  B  C
0  foo  0  A
3  bar  1  A

df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1]
print (df)
A  B  C
0  foo  0  A
3  bar  1  A
``````

I came up with a solution using `groupby`:

``````groupped = df.groupby(['A', 'B']).size().reset_index().rename(columns={0: 'count'})
uniques = groupped[groupped['count'] == 1]
duplicates = df[~df.index.isin(uniques.index)]
``````

Duplicates now has the proper result:

``````    A       B   C
2   foo     1   B
3   bar     1   A
``````

Also, my original attempt in the question can be fixed by simply adding `keep=False` in the `drop_duplicates` method:

``````# Load Dataframe
df = pd.DataFrame({"A":["foo", "foo", "foo", "bar"], "B":[0,1,1,1], "C":["A","A","B","A"]})

uniques = df[['A', 'B']].drop_duplicates(keep=False)
duplicates = df[~df.index.isin(uniques.index)]
``````

Please @jezrael answer, I think it is safest(?), as I am using pandas indexes here.