Machine Learning

Machine learning specific functions.

janitor.ml.get_features_targets(df: pandas.core.frame.DataFrame, target_column_names: Union[str, List, Tuple, Hashable], feature_column_names: Optional[Union[str, Iterable[str], Hashable]] = None) → Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame][source]

Get the features and targets as separate DataFrames/Series.

This method does not mutate the original DataFrame.

The behaviour is as such:

  • target_column_names is mandatory.

  • If feature_column_names is present, then we will respect the column

    names inside there.

  • If feature_column_names is not passed in, then we will assume that

the rest of the columns are feature columns, and return them.

Functional usage example:

X, y = get_features_targets(df, target_column_names="measurement")

Method chaining example:

import pandas as pd
import janitor.ml
df = pd.DataFrame(...)
target_cols = ['output1', 'output2']
X, y = df.get_features_targets(target_column_names=target_cols)
Parameters
  • df – The pandas DataFrame object.

  • target_column_names – Either a column name or an iterable (list or tuple) of column names that are the target(s) to be predicted.

  • feature_column_names – (optional) The column name or iterable of column names that are the features (a.k.a. predictors) used to predict the targets.

Returns

(X, Y) the feature matrix (X) and the target matrix (Y). Both are pandas DataFrames.