janitor.get_dupes

janitor.get_dupes(df: pandas.core.frame.DataFrame, column_names: Optional[Union[str, Iterable[str], Hashable]] = None) → pandas.core.frame.DataFrame[source]

Return all duplicate rows.

This method does not mutate the original DataFrame.

Functional usage syntax:

df = pd.DataFrame(...)
df = get_dupes(df)

Method chaining syntax:

import pandas as pd
import janitor
df = pd.DataFrame(...).get_dupes()
Parameters
  • df – The pandas DataFrame object.

  • column_names – (optional) A column name or an iterable (list or tuple) of column names. Following pandas API, this only considers certain columns for identifying duplicates. Defaults to using all columns.

Returns

The duplicate rows, as a pandas DataFrame.