Processing Bird Call Data

Background

The following example was obtained by translating the R code from TidyTuesday 2019-04-30 to Python using Pandas and PyJanitor. It provides a simple example of using pyjanitor for: - column renaming - column name cleaning - dataframe merging

The data originates from a study of the effects of articifial light on bird behaviour. It is a subset of the original study for the Chicago area.

Citations

This data set originates from the publication:

Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Proceedings of the Royal Society B 286(1900): 20190364. https://doi.org/10.1098/rspb.2019.0364

To reference only the data, please cite the Dryad data package:

Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Data from: Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Dryad Digital Repository. https://doi.org/10.5061/dryad.8rr0498
[1]:
import pandas as pd
import janitor

Get Raw Data

Using pandas to import csv data.

[2]:
raw_birds = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Chicago_collision_data.csv")
raw_call = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/bird_call.csv", sep=" ")
raw_light = pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Light_levels_dryad.csv")

Original DataFrames

Taking a quick look at the three imported (raw) pandas dataframes.

[3]:
raw_birds.head()
[3]:
Genus Species Date Locality
0 Ammodramus nelsoni 1982-10-03 MP
1 Ammodramus nelsoni 1984-05-21 CHI
2 Ammodramus nelsoni 1984-05-25 MP
3 Ammodramus nelsoni 1985-10-08 MP
4 Ammodramus nelsoni 1986-09-10 MP
[4]:
raw_call.head()
[4]:
Species Family Collisions Flight Call Habitat Stratum
0 Zonotrichia albicollis Passerellidae 10133 Yes Forest Lower
1 Junco hyemalis Passerellidae 6303 Yes Edge Lower
2 Melospiza melodia Passerellidae 5124 Yes Edge Lower
3 Melospiza georgiana Passerellidae 4910 Yes Open Lower
4 Seiurus aurocapilla Parulidae 4580 Yes Forest Lower
[5]:
raw_light.head()
[5]:
Date Light_Score
0 2000-03-06 3
1 2000-03-08 15
2 2000-03-10 3
3 2000-03-31 3
4 2000-04-02 17

Cleaning Data Using Pyjanitor

Pyjanitor provides additional method calls to standard pandas dataframe objects. The clean_names() method is one example which removes whitespace and lowercases all column names.

[6]:
clean_light = raw_light.clean_names()
[7]:
clean_light.head()
[7]:
date light_score
0 2000-03-06 3
1 2000-03-08 15
2 2000-03-10 3
3 2000-03-31 3
4 2000-04-02 17

Pyjanitor champions the cleaning process using the call chaining approach. We use this here to provide multiple column renaming. As our dataframes have inconsistent column names we rename the columns in the raw_call dataframe.

[8]:
clean_call = (
    raw_call
    .rename_column("Species", "Genus") # rename 'Species' column to 'Genus'
    .rename_column("Family", "Species") # rename 'Family' columnto 'Species'
)
[9]:
clean_call.head()
[9]:
Genus Species Collisions Flight Call Habitat Stratum
0 Zonotrichia albicollis Passerellidae 10133 Yes Forest Lower
1 Junco hyemalis Passerellidae 6303 Yes Edge Lower
2 Melospiza melodia Passerellidae 5124 Yes Edge Lower
3 Melospiza georgiana Passerellidae 4910 Yes Open Lower
4 Seiurus aurocapilla Parulidae 4580 Yes Forest Lower

We can chain as many standard pandas commands as we like, along with any pyjanitor specific methods.

[10]:
clean_birds = (
    raw_birds
    .merge(clean_call, how='left') # merge the raw_birds dataframe with clean_raw dataframe
    .select_columns(["Genus", "Species", "Date", "Locality", "Collisions", "Call", "Habitat", "Stratum"]) # include list of cols
    .clean_names()
    .rename_column("collisions", "family") # rename 'collisions' column to 'family' in merged dataframe
    .rename_column("call", "flight_call")
    .dropna() # drop all rows which contain a NaN
)
[11]:
clean_birds.head()
[11]:
genus species date locality family flight_call habitat stratum
89 Passerculus sandwichensis 1978-10-27 MP Passerellidae Yes Open Lower\t
90 Passerculus sandwichensis 1979-10-23 MP Passerellidae Yes Open Lower\t
91 Passerculus sandwichensis 1980-04-19 MP Passerellidae Yes Open Lower\t
92 Passerculus sandwichensis 1981-09-23 MP Passerellidae Yes Open Lower\t
93 Passerculus sandwichensis 1982-05-20 MP Passerellidae Yes Open Lower\t