janitor.pivot_wider

janitor.pivot_wider(df: pandas.core.frame.DataFrame, index: Optional[Union[Sequence[str], str]] = None, names_from: Optional[Union[Sequence[str], str]] = None, values_from: Optional[Union[Sequence[str], str]] = None, names_sort: Optional[bool] = True, flatten_levels: Optional[bool] = True, names_from_position: Optional[str] = 'first', names_prefix: Optional[str] = None, names_sep: Optional[str] = '_', aggfunc: Optional[Union[str, list, dict, Callable]] = None, fill_value: Optional[Union[int, float, str]] = None) → pandas.core.frame.DataFrame[source]

Reshapes data from long to wide form. The number of columns are increased, while decreasing the number of rows.

It is the inverse of the pivot_longer method, and is a wrapper around pd.DataFrame.pivot method.

This method does not mutate the original DataFrame.

Reshaping to wide form :

     name variable  value
0   Alice      wk1      5
1   Alice      wk2      9
2   Alice      wk3     20
3   Alice      wk4     22
4     Bob      wk1      7
5     Bob      wk2     11
6     Bob      wk3     17
7     Bob      wk4     33
8   Carla      wk1      6
9   Carla      wk2     13
10  Carla      wk3     39
11  Carla      wk4     40

df = (
    pd.DataFrame(...)
    .pivot_wider(
        index = "name",
        names_from = "variable",
        values_from = "value"
    )

     name    wk1   wk2   wk3   wk4
0    Alice     5     9    20    22
1    Bob       7    11    17    33
2    Carla     6    13    39    40

Pivoting on multiple columns is possible :

    name    n  pct
0     1  10.0  0.1
1     2  20.0  0.2
2     3  30.0  0.3

df = (
    pd.DataFrame(...)
    .assign(num = 0)
    .pivot_wider(
        index = "num",
        names_from = "name",
        values_from = ["n", "pct"],
        names_sep = "_"
     )
 )

    num n_1  n_2  n_3  pct_1  pct_2  pct_3
0   0   10   20   30   0.1    0.2    0.3

Aggregations are also possible with the aggfunc parameter:

df = pd.DataFrame([{'id': 'a', 'name': 'Adam', 'value': 5},
                   {'id': 'b', 'name': 'Eve', 'value': 6},
                   {'id': 'c', 'name': 'Adam', 'value': 4},
                   {'id': 'a', 'name': 'Eve', 'value': 3},
                   {'id': 'd', 'name': 'Seth', 'value': 2},
                   {'id': 'b', 'name': 'Adam', 'value': 4},
                   {'id': 'a', 'name': 'Adam', 'value': 2}])

id  name    value
a   Adam    5
b   Eve     6
c   Adam    4
a   Eve     3
d   Seth    2
b   Adam    4
a   Adam    2

df.pivot_wider(
    index = "id",
    names_from = "name",
    aggfunc = np.sum,
    values_from = "value",
    flatten_levels = True,
    fill_value = 0
    )

    id  Adam  Eve  Seth
0   a     7    3     0
1   b     4    6     0
2   c     4    0     0
3   d     0    0     2

Note

You may choose not to collapse the levels by passing False to the flatten_levels argument.

Note

A ValueError is raised if the index is not unique and aggfunc is None.

Functional usage syntax:

import pandas as pd
import janitor as jn

df = pd.DataFrame(...)

df = jn.pivot_wider(
    df = df,
    index = [column1, column2, ...],
    names_from = [column3, column4, ...],
    value_from = [column5, column6, ...],
    names_sort = True/False,
    names_prefix = string,
    names_sep = string,
    flatten_levels = True/False,
    names_from_position = "first"/"last",
    aggfunc,
    fill_value = fill_value
)

Method chaining syntax:

df = (
    pd.DataFrame(...)
    .pivot_wider(
        index = [column1, column2, ...],
        names_from = [column3, column4, ...],
        value_from = [column5, column6, ...],
        names_sort = True/False,
        names_prefix = string,
        names_sep = string,
        flatten_levels = True/False,
        names_from_position = "first"/"last",
        aggfunc,
        fill_value = fill_value
        )
)
Parameters
  • df – A pandas dataframe.

  • index – Name(s) of columns to use as identifier variables. Should be either a single column name, or a list of column names. The janitor.select_columns syntax is supported here, allowing for flexible and dynamic column selection. If index is not provided, the current dataframe’s index is used.

  • names_from – Name(s) of column(s) to use to make the new dataframe’s columns. Should be either a single column name, or a list of column names. The janitor.select_columns syntax is supported here, allowing for flexible and dynamic column selection. A label or labels must be provided for names_from.

  • values_from – Name(s) of column(s) that will be used for populating the new dataframe’s values. Should be either a single column name, or a list of column names. The janitor.select_columns syntax is supported here, allowing for flexible and dynamic column selection. If values_from is not specified, all remaining columns will be used. If flatten_levels is False, a MultiIndex dataframe is created.

  • names_sort – Default is True. Sorts columns by order of appearance. Set as True to get the columns sorted lexicographicially, or if the columns are of category type.

  • flatten_levels – Default is True. If False, the dataframe stays as a MultiIndex.

  • names_from_position – By default, the values in names_from stay at the front of the new column names. This can be changed to “last”; this places the values in names_from at the tail of the column names.

  • names_prefix – String to be added to the front of each output column. Can be handy if the values in names_from are numeric data types. Applicable only if flatten_levels is True.

  • names_sep – If names_from or values_from contain multiple variables, this will be used to join their values into a single string to use as a column name. Default is _. Applicable only if flatten_levels is True.

  • aggfunc – An aggregate function. It can be a function, a string, list of functions, or a dictionary, pairing column name with aggregate function.

  • fill_value – Scalar value to replace missing values with (after pivoting).

Returns

A pandas DataFrame that has been unpivoted from long to wide form.

Raises
  • TypeError – if index or names_from is not a string, or a list of strings.

  • ValueError – if names_from is None.

  • TypeError – if names_sep is not a string.

  • TypeError – if values_from is not a string or a list of strings.

  • TypeError – if names_sort is not a boolean.

  • TypeError – if flatten_levels is not a boolean.

  • ValueError – if values in index or names_from or values_from do not exist in the dataframe.

  • ValueError – if the combination of index and names_from is not unique and aggfunc is None.