janitor.jitter

janitor.jitter(df: pandas.core.frame.DataFrame, column_name: Hashable, dest_column_name: str, scale: numpy.number, clip: Optional[Iterable[numpy.number]] = None, random_state: Optional[numpy.number] = None) → pandas.core.frame.DataFrame[source]

Adds Gaussian noise (jitter) to the values of a column.

Functional usage syntax:

import pandas as pd
import janitor as jn

df = pd.DataFrame(...)

df = jn.functions.jitter(
    df=df,
    column_name='values',
    dest_column_name='values_jitter',
    scale=1.0,
    clip=None,
    random_state=None,
)

Method chaining usage example:

import pandas as pd
import janitor

df = pd.DataFrame(...)

df = df.jitter(
    column_name='values',
    dest_column_name='values_jitter',
    scale=1.0,
    clip=None,
    random_state=None,
)

A new column will be created containing the values of the original column with Gaussian noise added. For each value in the column, a Gaussian distribution is created having a location (mean) equal to the value and a scale (standard deviation) equal to scale. A random value is then sampled from this distribution, which is the jittered value. If a tuple is supplied for clip, then any values of the new column less than clip[0] will be set to clip[0], and any values greater than clip[1] will be set to clip[1]. Additionally, if a numeric value is supplied for random_state, this value will be used to set the random seed used for sampling. NaN values are ignored in this method.

This method mutates the original DataFrame.

Parameters
  • df – A pandas dataframe.

  • column_name – Name of the column containing values to add Gaussian jitter to.

  • dest_column_name – The name of the new column containing the jittered values that will be created.

  • scale – A positive value multiplied by the original column value to determine the scale (standard deviation) of the Gaussian distribution to sample from. (A value of zero results in no jittering.)

  • clip – An iterable of two values (minimum and maximum) to clip the jittered values to, default to None.

  • random_state – An integer or 1-d array value used to set the random seed, default to None.

Returns

A pandas DataFrame with a new column containing Gaussian- jittered values from another column.

Raises
  • TypeError – if column_name is not numeric.

  • ValueError – if scale is not a numerical value greater than 0.

  • ValueError – if clip is not an iterable of length 2.

  • ValueError – if clip[0] is not less than clip[1].