janitor.impute

janitor.impute(df: pandas.core.frame.DataFrame, column_name: Hashable, value: Optional[Any] = None, statistic_column_name: Optional[str] = None) → pandas.core.frame.DataFrame[source]

Method-chainable imputation of values in a column.

This method mutates the original DataFrame.

Underneath the hood, this function calls the .fillna() method available to every pandas.Series object.

Method-chaining example:

import numpy as np
import pandas as pd
import janitor

data = {
    "a": [1, 2, 3],
    "sales": np.nan,
    "score": [np.nan, 3, 2]}
df = (
    pd.DataFrame(data)
    # Impute null values with 0
    .impute(column_name='sales', value=0.0)
    # Impute null values with median
    .impute(column_name='score', statistic_column_name='median')
)

Either one of value or statistic_column_name should be provided.

If value is provided, then all null values in the selected column will

take on the value provided.

If statistic_column_name is provided, then all null values in the selected column will take on the summary statistic value of other non-null values.

Currently supported statistics include:

  • mean (also aliased by average)

  • median

  • mode

  • minimum (also aliased by min)

  • maximum (also aliased by max)

Parameters
  • df – A pandas DataFrame

  • column_name – The name of the column on which to impute values.

  • value – (optional) The value to impute.

  • statistic_column_name – (optional) The column statistic to impute.

Returns

An imputed pandas DataFrame.

Raises
  • ValueError – if both value and statistic are provided.

  • KeyError – if statistic is not one of mean, average median, mode, minimum, min, maximum, or max.