janitor.coalesce

janitor.coalesce(df: pandas.core.frame.DataFrame, column_names: Iterable[Hashable], new_column_name: Optional[str] = None, delete_columns: bool = True) → pandas.core.frame.DataFrame[source]

Coalesce two or more columns of data in order of column names provided.

This method does not mutate the original DataFrame.

Functional usage syntax:

df = coalesce(df, columns=['col1', 'col2'], 'col3')

Method chaining syntax:

import pandas as pd
import janitor
df = pd.DataFrame(...).coalesce(['col1', 'col2'])

The first example will create a new column called ‘col3’ with values from ‘col2’ inserted where values from ‘col1’ are NaN, then delete the original columns. The second example will keep the name ‘col1’ in the new column.

This is more syntactic diabetes! For R users, this should look familiar to dplyr’s coalesce function; for Python users, the interface should be more intuitive than the pandas.Series.combine_first() method (which we’re just using internally anyways).

Parameters
  • df – A pandas DataFrame.

  • column_names – A list of column names.

  • new_column_name – The new column name after combining.

  • delete_columns – Whether to delete the columns being coalesced

Returns

A pandas DataFrame with coalesced columns.