General Functions

Modify columns

expand_column(df, column_name, sep[, concat])

Expand a categorical column with multiple labels into dummy-coded columns.

concatenate_columns(df, column_names, …[, sep])

Concatenates the set of columns into a single column.

deconcatenate_column(df, column_name[, sep, …])

De-concatenates a single column into multiple columns.

remove_columns(df, column_names)

Remove the set of columns specified in column_names.

add_column(df, column_name, value[, …])

Add a column to the dataframe.

add_columns(df[, fill_remaining])

Add multiple columns to the dataframe.

transform_column(df, column_name, function)

Transform the given column in-place using the provided function.

transform_columns(df, column_names, function)

Transform multiple columns through the same transformation.

rename_column(df, old_column_name, …)

Rename a column in place.

rename_columns(df, new_column_names)

Rename columns in place.

reorder_columns(df, column_order)

Reorder DataFrame columns by specifying desired order as list of col names.

collapse_levels(df[, sep])

Flatten multi-level column dataframe to a single level.

change_type(df, column_name, dtype[, …])

Change the type of a column.

limit_column_characters(df, column_length[, …])

Truncate column sizes to a specific length.

row_to_names(df[, row_number, remove_row, …])

Elevates a row to be the column names of a DataFrame.

clean_names(df[, strip_underscores, …])

Clean column names.

currency_column_to_numeric(df, column_name)

Convert currency column to numeric.

groupby_agg(df, by, new_column_name, …[, …])

Shortcut for assigning a groupby-transform to a new column.

join_apply(df, func, new_column_name)

Join the result of applying a function across dataframe rows.

drop_duplicate_columns(df, column_name[, …])

Remove a duplicated column specified by column_name, its index.

process_text(df, column_name[, …])

Apply a Pandas string method to an existing column and return a dataframe.

Modify values

fill_empty(df, column_names, value)

Fill NaN values in specified columns with a given value.

fill_direction(df[, directions, limit])

Provide a method-chainable function for filling missing values in selected columns.

convert_excel_date(df, column_name)

Convert Excel’s serial date format into Python datetime format.

convert_matlab_date(df, column_name)

Convert Matlab’s serial date number into Python datetime format.

convert_unix_date(df, column_name)

Convert unix epoch time into Python datetime format.

remove_empty(df)

Drop all rows and columns that are completely null.

coalesce(df, column_names[, …])

Coalesce two or more columns of data in order of column names provided.

find_replace(df[, match])

Perform a find-and-replace action on provided columns.

round_to_fraction(df[, column_name, …])

Round all values in a column to a fraction.

update_where(df, conditions, …)

Add multiple conditions to update a column in the dataframe.

to_datetime(df, column_name, **kwargs)

Method-chainable to_datetime.

jitter(df, column_name, dest_column_name, scale)

Adds Gaussian noise (jitter) to the values of a column.

Filtering

take_first(df, subset, by[, ascending])

Take the first row within each group specified by subset.

filter_string(df, column_name, search_string)

Filter a string-based column according to whether it contains a substring.

filter_on(df, criteria[, complement])

Return a dataframe filtered on a particular criteria.

filter_date(df, column_name[, start_date, …])

Filter a date-based column based on certain criteria.

filter_column_isin(df, column_name, iterable)

Filter a dataframe for values in a column that exist in another iterable.

select_columns(df, search_column_names[, invert])

Method-chainable selection of columns.

dropnotnull(df, column_name)

Drop rows that do not have null values in the given column.

get_dupes(df[, column_names])

Return all duplicate rows.

Preprocessing

bin_numeric(df, from_column_name, to_column_name)

Generate a new column that labels bins for a specified numeric column.

encode_categorical(df[, column_names])

Encode the specified columns with Pandas’ category dtype.

impute(df, column_name[, value, …])

Method-chainable imputation of values in a column.

label_encode(df, column_names)

Convert labels into numerical data.

min_max_scale(df[, old_min, old_max, …])

Scales data to between a minimum and maximum value.

get_features_targets(*args, **kwargs)

Other

then(df, func)

Add an arbitrary function to run in the pyjanitor method chain.

also(df, func, *args, **kwargs)

Add an arbitrary function with no return value to run in the pyjanitor method chain.

shuffle(df[, random_state, reset_index])

Shuffle the rows of the DataFrame.

count_cumulative_unique(df, column_name, …)

Generates a running total of cumulative unique values in a given column.

sort_naturally(df, column_name, …)

Sort a DataFrame by a column using “natural” sorting.

expand_grid([df, df_key, others])

Creates a dataframe from a cartesian combination of all inputs.

flag_nulls(df[, column_name, columns])

Creates a new column to indicate whether you have null values in a given row.

move(df, source, target[, position, axis])

Move column or row to a position adjacent to another column or row in dataframe.

toset(series)

Return a set of the values.

unionize_dataframe_categories(*dataframes[, …])

Given a group of dataframes which contain some categorical columns, for each categorical column present, find all the possible categories across all the dataframes which have that column.

groupby_topk(df, groupby_column_name, …[, …])

Return top k rows from a groupby of a set of columns.

complete(df[, columns, fill_value, by])

This function turns implicit missing values into explicit missing values.

pivot_longer(df[, index, column_names, …])

Unpivots a DataFrame from ‘wide’ to ‘long’ format.

pivot_wider(df[, index, names_from, …])

Reshapes data from long to wide form.