Machine Learning

Machine learning specific functions.

janitor.ml.get_features_targets(df: pandas.core.frame.DataFrame, target_column_names: Union[str, List[T], Tuple, Hashable], feature_column_names: Union[str, Iterable[str], Hashable] = None)[source]

Get the features and targets as separate DataFrames/Series.

This method does not mutate the original DataFrame.

The behaviour is as such:

  • target_column_names is mandatory.
  • If feature_column_names is present, then we will respect the column
    names inside there.
  • If feature_column_names is not passed in, then we will assume that

the rest of the columns are feature columns, and return them.

Functional usage example:

X, y = get_features_targets(df, target_column_names="measurement")

Method chaining example:

import pandas as pd
import janitor.ml
df = pd.DataFrame(...)
target_cols = ['output1', 'output2']
X, y = df.get_features_targets(target_column_names=target_cols)
Parameters:
  • df – The pandas DataFrame object.
  • target_column_names (str/iterable) – Either a column name or an iterable (list or tuple) of column names that are the target(s) to be predicted.
  • feature_column_names (str/iterable) – (optional) The column name or iterable of column names that are the features (a.k.a. predictors) used to predict the targets.
Returns:

(X, Y) the feature matrix (X) and the target matrix (Y). Both are pandas DataFrames.