janitor.deconcatenate_column

janitor.deconcatenate_column(df: pandas.core.frame.DataFrame, column_name: Hashable, sep: Optional[str] = None, new_column_names: Optional[Union[List[str], Tuple[str]]] = None, autoname: str = None, preserve_position: bool = False) → pandas.core.frame.DataFrame[source]

De-concatenates a single column into multiple columns.

The column to de-concatenate can be either a collection (list, tuple, …) which can be separated out with pd.Series.tolist(), or a string to slice based on sep.

To determine this behaviour automatically, the first element in the column specified is inspected.

If it is a string, then sep must be specified. Else, the function assumes that it is an iterable type (e.g. list or tuple), and will attempt to deconcatenate by splitting the list.

Given a column with string values, this is the inverse of the concatenate_columns function.

Used to quickly split columns out of a single column.

The keyword argument preserve_position takes True or False boolean that controls whether the new_column_names will take the original position of the to-be-deconcatenated column_name:

  • When preserve_position=False (default), df.columns change from […, column_name, …] to […, column_name, …, new_column_names]. In other words, the deconcatenated new columns are appended to the right of the original dataframe and the original column_name is NOT dropped.

  • When preserve_position=True, df.column change from […, column_name, …] to […, new_column_names, …]. In other words, the deconcatenated new column will REPLACE the original column_name at its original position, and column_name itself is dropped.

The keyword argument autoname accepts a base string and then automatically creates numbered column names based off the base string. For example, if col is passed in as the argument to autoname, and 4 columns are created, then the resulting columns will be named col1, col2, col3, col4. Numbering is always 1-indexed, not 0-indexed, in order to make the column names human-friendly.

This method does not mutate the original DataFrame.

Functional usage syntax:

df = deconcatenate_column(
        df, column_name='id', new_column_names=['col1', 'col2'],
        sep='-', preserve_position=True
)

Method chaining syntax:

df = (pd.DataFrame(...).
        deconcatenate_column(
            column_name='id', new_column_names=['col1', 'col2'],
            sep='-', preserve_position=True
        ))
Parameters
  • df – A pandas DataFrame.

  • column_name – The column to split.

  • sep – The separator delimiting the column’s data.

  • new_column_names – A list of new column names post-splitting.

  • autoname – A base name for automatically naming the new columns. Takes precedence over new_column_names if both are provided.

  • preserve_position – Boolean for whether or not to preserve original position of the column upon de-concatenation, default to False

Returns

A pandas DataFrame with a deconcatenated column.

Raises
  • ValueError – if column_name is not present in the DataFrame.

  • ValueError – if sep is not provided and the column values are of type str.

  • ValueError – if either new_column_names or autoname is not supplied.

  • JanitorError – if incorrect number of names is provided within new_column_names.