AnsWiki | How to remove duplicate rows in a dataframe with Python/pandas ?

Question

How to remove duplicate rows in a dataframe with Python/pandas ?

Answiki · Accepted Answer

The best way to remove duplicate rows in a Pandas dataframe is to use the method drop_duplicates():

import pandas as pd 
df = pd.DataFrame([[0, 1], [2, 3], [2, 3], [2, 4]], columns = ['Col 1', 'Col 2']) 

# Remove duplicate rows
df.drop_duplicates(keep = 'first', inplace = True)

Before :

   Col 1  Col 2
0      0      1
1      2      3
2      2      3
3      2      4

After :

   Col 1  Col 2
0      0      1
1      2      3
3      2      4