janitor.count_cumulative_unique

janitor.count_cumulative_unique(df: pandas.core.frame.DataFrame, column_name: Hashable, dest_column_name: str, case_sensitive: bool = True) → pandas.core.frame.DataFrame[source]

Generates a running total of cumulative unique values in a given column.

Functional usage syntax:

import pandas as pd
import janitor as jn

df = pd.DataFrame(...)

df = jn.functions.count_cumulative_unique(
    df=df,
    column_name='animals',
    dest_column_name='animals_unique_count',
    case_sensitive=True
)

Method chaining usage example:

import pandas as pd
import janitor

df = pd.DataFrame(...)

df = df.count_cumulative_unique(
    column_name='animals',
    dest_column_name='animals_unique_count',
    case_sensitive=True
)

A new column will be created containing a running count of unique values in the specified column. If case_sensitive is True, then the case of any letters will matter (i.e., ‘a’ != ‘A’); otherwise, the case of any letters will not matter.

This method mutates the original DataFrame.

Parameters
  • df – A pandas dataframe.

  • column_name – Name of the column containing values from which a running count of unique values will be created.

  • dest_column_name – The name of the new column containing the cumulative count of unique values that will be created.

  • case_sensitive – Whether or not uppercase and lowercase letters will be considered equal (e.g., ‘A’ != ‘a’ if True).

Returns

A pandas DataFrame with a new column containing a cumulative count of unique values from another column.