Expand_grid : Create a dataframe from all combinations of inputs.

Background

This notebook serves to show examples of how expand_grid works. Expand_grid aims to offer similar functionality to R’s expand_grid function. Expand_grid creates a dataframe from a combination of all inputs. One requirement is that a dictionary be provided. If a dataframe is provided, a key must be provided as well.

Some of the examples used here are from tidyr’s expand_grid page and from Pandas’ cookbook.

[1]:
import pandas as pd
import numpy as np
from janitor import expand_grid
[2]:
data = {"x":[1,2,3], "y":[1,2]}

result = expand_grid(others = data)

result
[2]:
x y
0 1 1
1 1 2
2 2 1
3 2 2
4 3 1
5 3 2
[3]:
#combination of letters

data = {"l1":list("abcde"), "l2" : list("ABCDE")}

result = expand_grid(others = data)

result.head(10)
[3]:
l1 l2
0 a A
1 a B
2 a C
3 a D
4 a E
5 b A
6 b B
7 b C
8 b D
9 b E
[4]:

data = {'height': [60, 70],
        'weight': [100, 140, 180],
        'sex': ['Male', 'Female']}

result = expand_grid(others = data)

result
[4]:
height weight sex
0 60 100 Male
1 60 100 Female
2 60 140 Male
3 60 140 Female
4 60 180 Male
5 60 180 Female
6 70 100 Male
7 70 100 Female
8 70 140 Male
9 70 140 Female
10 70 180 Male
11 70 180 Female
[5]:
#A dictionary of arrays
#Arrays can only have dimensions of 1 or 2

data = {"x1":np.array([[1,3],[2,4]]),
        "x2":np.array([[5,7],[6,8]])}

result = expand_grid(others=data)

result
[5]:
x1_0 x1_1 x2_0 x2_1
0 1 3 5 7
1 1 3 6 8
2 2 4 5 7
3 2 4 6 8
[6]:
#This shows how to method chain expand_grid
#to an existing dataframe

df = pd.DataFrame({"x":[1,2], "y":[2,1]})
data = {"z":[1,2,3]}

#a key has to be passed in for the dataframe
#this is added to the column name of the dataframe

result = df.expand_grid(df_key="df",others = data)

result
[6]:
df_x df_y z
0 1 2 1
1 1 2 2
2 1 2 3
3 2 1 1
4 2 1 2
5 2 1 3
[7]:

# expand_grid can work on multiple dataframes
# Ensure that there are keys
# for each dataframe in the dictionary

df1 = pd.DataFrame({"x":range(1,3), "y":[2,1]})
df2 = pd.DataFrame({"x":[1,2,3],"y":[3,2,1]})
df3 = pd.DataFrame({"x":[2,3],"y":["a","b"]})

data = {"df1":df1, "df2":df2, "df3":df3}

result = expand_grid(others=data)

result
[7]:
df1_x df1_y df2_x df2_y df3_x df3_y
0 1 2 1 3 2 a
1 1 2 1 3 3 b
2 1 2 2 2 2 a
3 1 2 2 2 3 b
4 1 2 3 1 2 a
5 1 2 3 1 3 b
6 2 1 1 3 2 a
7 2 1 1 3 3 b
8 2 1 2 2 2 a
9 2 1 2 2 3 b
10 2 1 3 1 2 a
11 2 1 3 1 3 b