janitor.find_replace

janitor.find_replace(df: pandas.core.frame.DataFrame, match: str = 'exact', **mappings) → pandas.core.frame.DataFrame[source]

Perform a find-and-replace action on provided columns.

Depending on use case, users can choose either exact, full-value matching, or regular-expression-based fuzzy matching (hence allowing substring matching in the latter case). For strings, the matching is always case sensitive.

For instance, given a dataframe containing orders at a coffee shop:

df = pd.DataFrame({
    'customer': ['Mary', 'Tom', 'Lila'],
    'order': ['ice coffee', 'lemonade', 'regular coffee']
})

Our task is to replace values ‘ice coffee’ and ‘regular coffee’ of the ‘order’ column into ‘latte’.

Example 1 for exact matching

# Functional usage
df = find_replace(
    df,
    match='exact',
    order={'ice coffee': 'latte', 'regular coffee': 'latte'},
)

# Method chaining usage
df = df.find_replace(
    match='exact'
    order={'ice coffee': 'latte', 'regular coffee': 'latte'},
)

Example 2: Regular-expression-based matching

# Functional usage
df = find_replace(
    df,
    match='regex',
    order={'coffee$': 'latte'},
)

# Method chaining usage
df = df.find_replace(
    match='regex',
    order={'coffee$': 'latte'},
)

To perform a find and replace on the entire dataframe, pandas’ df.replace() function provides the appropriate functionality. You can find more detail on the replace docs.

This function only works with column names that have no spaces or punctuation in them. For example, a column name item_name would work with find_replace, because it is a contiguous string that can be parsed correctly, but item name would not be parsed correctly by the Python interpreter.

If you have column names that might not be compatible, we recommend calling on clean_names() as the first method call. If, for whatever reason, that is not possible, then _find_replace() is available as a function that you can do a pandas pipe call on.

Parameters
  • df – A pandas DataFrame.

  • match – Whether or not to perform an exact match or not. Valid values are “exact” or “regex”.

  • mappings – keyword arguments corresponding to column names that have dictionaries passed in indicating what to find (keys) and what to replace with (values).

Returns

A pandas DataFrame with replaced values.