General Functions¶
Modify columns¶
|
Expand a categorical column with multiple labels into dummy-coded columns. |
|
Concatenates the set of columns into a single column. |
|
De-concatenates a single column into multiple columns. |
|
Remove the set of columns specified in column_names. |
|
Add a column to the dataframe. |
|
Add multiple columns to the dataframe. |
|
Transform the given column in-place using the provided function. |
|
Transform multiple columns through the same transformation. |
|
Rename a column in place. |
|
Rename columns in place. |
|
Reorder DataFrame columns by specifying desired order as list of col names. |
|
Flatten multi-level column dataframe to a single level. |
|
Change the type of a column. |
|
Truncate column sizes to a specific length. |
|
Elevates a row to be the column names of a DataFrame. |
|
Clean column names. |
|
Convert currency column to numeric. |
|
Shortcut for assigning a groupby-transform to a new column. |
|
Join the result of applying a function across dataframe rows. |
|
Remove a duplicated column specified by column_name, its index. |
|
Apply a Pandas string method to an existing column and return a dataframe. |
Modify values¶
|
Fill NaN values in specified columns with a given value. |
|
Provide a method-chainable function for filling missing values in selected columns. |
|
Convert Excel’s serial date format into Python datetime format. |
|
Convert Matlab’s serial date number into Python datetime format. |
|
Convert unix epoch time into Python datetime format. |
|
Drop all rows and columns that are completely null. |
|
Coalesce two or more columns of data in order of column names provided. |
|
Perform a find-and-replace action on provided columns. |
|
Round all values in a column to a fraction. |
|
Add multiple conditions to update a column in the dataframe. |
|
Method-chainable to_datetime. |
|
Adds Gaussian noise (jitter) to the values of a column. |
Filtering¶
|
Take the first row within each group specified by subset. |
|
Filter a string-based column according to whether it contains a substring. |
|
Return a dataframe filtered on a particular criteria. |
|
Filter a date-based column based on certain criteria. |
|
Filter a dataframe for values in a column that exist in another iterable. |
|
Method-chainable selection of columns. |
|
Drop rows that do not have null values in the given column. |
|
Return all duplicate rows. |
Preprocessing¶
|
Generate a new column that labels bins for a specified numeric column. |
|
Encode the specified columns with Pandas’ category dtype. |
|
Method-chainable imputation of values in a column. |
|
Convert labels into numerical data. |
|
Scales data to between a minimum and maximum value. |
|
Other¶
|
Add an arbitrary function to run in the |
|
Add an arbitrary function with no return value to run in the |
|
Shuffle the rows of the DataFrame. |
|
Generates a running total of cumulative unique values in a given column. |
|
Sort a DataFrame by a column using “natural” sorting. |
|
Creates a dataframe from a cartesian combination of all inputs. |
|
Creates a new column to indicate whether you have null values in a given row. |
|
Move column or row to a position adjacent to another column or row in dataframe. |
|
Return a set of the values. |
|
Given a group of dataframes which contain some categorical columns, for each categorical column present, find all the possible categories across all the dataframes which have that column. |
|
Return top k rows from a groupby of a set of columns. |
|
This function turns implicit missing values into explicit missing values. |
|
Unpivots a DataFrame from ‘wide’ to ‘long’ format. |
|
Reshapes data from long to wide form. |