janitor.clean_names

janitor.clean_names(df: pandas.core.frame.DataFrame, strip_underscores: Optional[Union[str, bool]] = None, case_type: str = 'lower', remove_special: bool = False, strip_accents: bool = True, preserve_original_columns: bool = True, enforce_string: bool = True, truncate_limit: int = None) → pandas.core.frame.DataFrame[source]

Clean column names.

Takes all column names, converts them to lowercase, then replaces all spaces with underscores.

By default, column names are converted to string types. This can be switched off by passing in enforce_string=False.

This method does not mutate the original DataFrame.

Functional usage syntax:

df = clean_names(df)

Method chaining syntax:

import pandas as pd
import janitor
df = pd.DataFrame(...).clean_names()
Example of transformation

Columns before: First Name, Last Name, Employee Status, Subject
Columns after: first_name, last_name, employee_status, subject
Parameters
  • df – The pandas DataFrame object.

  • strip_underscores – (optional) Removes the outer underscores from all column names. Default None keeps outer underscores. Values can be either ‘left’, ‘right’ or ‘both’ or the respective shorthand ‘l’, ‘r’ and True.

  • case_type – (optional) Whether to make columns lower or uppercase. Current case may be preserved with ‘preserve’, while snake case conversion (from CamelCase or camelCase only) can be turned on using “snake”. Default ‘lower’ makes all characters lowercase.

  • remove_special – (optional) Remove special characters from columns. Only letters, numbers and underscores are preserved.

  • strip_accents – Whether or not to remove accents from columns names.

  • preserve_original_columns – (optional) Preserve original names. This is later retrievable using df.original_columns.

  • enforce_string – Whether or not to convert all column names to string type. Defaults to True, but can be turned off. Columns with >1 levels will not be converted by default.

  • truncate_limit – (optional) Truncates formatted column names to the specified length. Default None does not truncate.

Returns

A pandas DataFrame.