janitor.flag_nulls

janitor.flag_nulls(df: pandas.core.frame.DataFrame, column_name: Optional[Hashable] = 'null_flag', columns: Optional[Union[str, Iterable[str], Hashable]] = None) → pandas.core.frame.DataFrame[source]

Creates a new column to indicate whether you have null values in a given row. If the columns parameter is not set, looks across the entire DataFrame, otherwise will look only in the columns you set.

import pandas as pd
import janitor as jn

df = pd.DataFrame(
    {'a': [1, 2, None, 4],
     'b': [5.0, None, 7.0, 8.0]})

df.flag_nulls()
#  'a' | 'b'  | 'null_flag'
#   1  | 5.0  |   0
#   2  | NaN  |   1
#  NaN | 7.0  |   1
#   4  | 8.0  |   0

jn.functions.flag_nulls(df)
#  'a' | 'b'  | 'null_flag'
#   1  | 5.0  |   0
#   2  | NaN  |   1
#  NaN | 7.0  |   1
#   4  | 8.0  |   0

df.flag_nulls(columns=['b'])
#  'a' | 'b'  | 'null_flag'
#   1  | 5.0  |   0
#   2  | NaN  |   1
#  NaN | 7.0  |   0
#   4  | 8.0  |   0
Parameters
  • df – Input Pandas dataframe.

  • column_name – Name for the output column. Defaults to ‘null_flag’.

  • columns – List of columns to look at for finding null values. If you only want to look at one column, you can simply give its name. If set to None (default), all DataFrame columns are used.

Returns

Input dataframe with the null flag column.

Raises
  • ValueError – if column_name is already present in the DataFrame.

  • ValueError – if a column within columns is no present in the DataFrame.