General Functions

Modify columns

expand_column(df, column_name, sep, concat) Expand a categorical column with multiple labels into dummy-coded columns.
concatenate_columns(df, column_names, …) Concatenates the set of columns into a single column.
deconcatenate_column(df, column_name, sep, …) De-concatenates a single column into multiple columns.
remove_columns(df, column_names, …) Remove the set of columns specified in column_names.
add_column(df, column_name, value, …) Add a column to the dataframe.
add_columns(df, fill_remaining, **kwargs) Add multiple columns to the dataframe.
transform_column(df, column_name, function, …) Transform the given column in-place using the provided function.
transform_columns(df, column_names, …) Transform multiple columns through the same transformation.
rename_column(df, old_column_name, …) Rename a column in place.
rename_columns(df, new_column_names, VT]) Rename columns in place.
reorder_columns(df, column_order, …) Reorder DataFrame columns by specifying desired order as list of col names.
collapse_levels(df, sep) Flatten multi-level column dataframe to a single level.
change_type(df, column_name, dtype, …) Change the type of a column.
limit_column_characters(df, column_length, …) Truncate column sizes to a specific length.
row_to_names(df, row_number, remove_row, …) Elevates a row to be the column names of a DataFrame.
clean_names(df, strip_underscores, …) Clean column names.
currency_column_to_numeric(df, column_name, …) Convert currency column to numeric.
groupby_agg(df, by, str], new_column_name, …) Shortcut for assigning a groupby-transform to a new column.
join_apply(df, func, new_column_name) Join the result of applying a function across dataframe rows.
drop_duplicate_columns(df, column_name, …) Remove a duplicated column specified by column_name, its index.

Modify values

fill_empty(df, column_names, Iterable[str], …) Fill NaN values in specified columns with a given value.
convert_excel_date(df, column_name) Convert Excel’s serial date format into Python datetime format.
convert_matlab_date(df, column_name) Convert Matlab’s serial date number into Python datetime format.
convert_unix_date(df, column_name) Convert unix epoch time into Python datetime format.
remove_empty(df) Drop all rows and columns that are completely null.
coalesce(df, column_names, new_column_name, …) Coalesce two or more columns of data in order of column names provided.
find_replace(df, match, **mappings) Perform a find-and-replace action on provided columns.
round_to_fraction(df, column_name, …) Round all values in a column to a fraction.
update_where(df, conditions, …) Add multiple conditions to update a column in the dataframe.
to_datetime(df, column_name, **kwargs) Method-chainable to_datetime.
jitter(df, column_name, dest_column_name, …) Adds Gaussian noise (jitter) to the values of a column.

Filtering

take_first(df, subset, Iterable[Hashable]], …) Take the first row within each group specified by subset.
filter_string(df, column_name, …) Filter a string-based column according to whether it contains a substring.
filter_on(df, criteria, complement) Return a dataframe filtered on a particular criteria.
filter_date(df, column_name, start_date, …) Filter a date-based column based on certain criteria.
filter_column_isin(df, column_name, …) Filter a dataframe for values in a column that exist in another iterable.
select_columns(df, search_column_names, invert) Method-chainable selection of columns.
dropnotnull(df, column_name) Drop rows that do not have null values in the given column.
get_dupes(df, column_names, Iterable[str], …) Return all duplicate rows.

Preprocessing

bin_numeric(df, from_column_name, …) Generate a new column that labels bins for a specified numeric column.
encode_categorical(df, column_names, …) Encode the specified columns with Pandas’ category dtype.
impute(df, column_name, value, …) Method-chainable imputation of values in a column.
label_encode(df, column_names, …) Convert labels into numerical data.
min_max_scale(df[, old_min, old_max, …]) Scales data to between a minimum and maximum value.
get_features_targets(*args, **kwargs)

Other

then(df, func) Add an arbitrary function to run in the pyjanitor method chain.
shuffle(df[, random_state, reset_index]) Shuffle the rows of the DataFrame.
count_cumulative_unique(df, column_name, …) Generates a running total of cumulative unique values in a given column.
sort_naturally(df, column_name, …) Sort an DataFrame by a column using “natural” sorting.