janitor.transform_column

janitor.transform_column(df: pandas.core.frame.DataFrame, column_name: Hashable, function: Callable, dest_column_name: Optional[str] = None, elementwise: bool = True) → pandas.core.frame.DataFrame[source]

Transform the given column in-place using the provided function.

Functions can be applied one of two ways:

  • Element-wise (default; elementwise=True)

  • Column-wise (alternative; elementwise=False)

If the function is applied “elementwise”, then the first argument of the function signature should be the individual element of each function. This is the default behaviour of transform_column, because it is easy to understand. For example:

def elemwise_func(x):
    modified_x = ... # do stuff here
    return modified_x

df.transform_column(column_name="my_column", function=elementwise_func)

On the other hand, columnwise application of a function behaves as if the function takes in a pandas Series and emits back a sequence that is of identical length to the original. One place where this is desirable is to gain access to pandas native string methods, which are super fast!

def columnwise_func(s: pd.Series) -> pd.Series:
    return s.str[0:5]

df.transform_column(
    column_name="my_column",
    lambda s: s.str[0:5],
    elementwise=False
)

This method does not mutate the original DataFrame.

Let’s say we wanted to apply a log10 transform a column of data.

Originally one would write code like this:

# YOU NO LONGER NEED TO WRITE THIS!
df[column_name] = df[column_name].apply(np.log10)

With the method chaining syntax, we can do the following instead:

df = (
    pd.DataFrame(...)
    .transform_column(column_name, np.log10)
)

With the functional syntax:

df = pd.DataFrame(...)
df = transform_column(df, column_name, np.log10)
Parameters
  • df – A pandas DataFrame.

  • column_name – The column to transform.

  • function – A function to apply on the column.

  • dest_column_name – The column name to store the transformation result in. Defaults to None, which will result in the original column name being overwritten. If a name is provided here, then a new column with the transformed values will be created.

  • elementwise – Whether to apply the function elementwise or not. If elementwise is True, then the function’s first argument should be the data type of each datum in the column of data, and should return a transformed datum. If elementwise is False, then the function’s should expect a pandas Series passed into it, and return a pandas Series.

Returns

A pandas DataFrame with a transformed column.