janitor.min_max_scale

janitor.min_max_scale(df: pandas.core.frame.DataFrame, old_min=None, old_max=None, column_name=None, new_min=0, new_max=1) → pandas.core.frame.DataFrame[source]

Scales data to between a minimum and maximum value.

This method mutates the original DataFrame.

If minimum and maximum are provided, the true min/max of the DataFrame or column is ignored in the scaling process and replaced with these values, instead.

One can optionally set a new target minimum and maximum value using the new_min and new_max keyword arguments. This will result in the transformed data being bounded between new_min and new_max.

If a particular column name is specified, then only that column of data are scaled. Otherwise, the entire dataframe is scaled.

Method chaining syntax:

df = pd.DataFrame(...).min_max_scale(column_name="a")

Setting custom minimum and maximum:

df = (
    pd.DataFrame(...)
    .min_max_scale(
        column_name="a",
        new_min=2,
        new_max=10
    )
)

Setting a min and max that is not based on the data, while applying to entire dataframe:

df = (
    pd.DataFrame(...)
    .min_max_scale(
        old_min=0,
        old_max=14,
        new_min=0,
        new_max=1,
    )
)

The aforementioned example might be applied to something like scaling the isoelectric points of amino acids. While technically they range from approx 3-10, we can also think of them on the pH scale which ranges from 1 to 14. Hence, 3 gets scaled not to 0 but approx. 0.15 instead, while 10 gets scaled to approx. 0.69 instead.

Parameters
  • df – A pandas DataFrame.

  • old_min – (optional) Overrides for the current minimum value of the data to be transformed.

  • old_max – (optional) Overrides for the current maximum value of the data to be transformed.

  • new_min – (optional) The minimum value of the data after it has been scaled.

  • new_max – (optional) The maximum value of the data after it has been scaled.

  • column_name – (optional) The column on which to perform scaling.

Returns

A pandas DataFrame with scaled data.

Raises
  • ValueError – if old_max is not greater than old_min.

  • ValueError – if new_max is not greater than new_min.