Time Series

Time series-specific data cleaning functions.

janitor.timeseries.fill_missing_timestamps(df: pandas.core.frame.DataFrame, frequency: str, first_time_stamp: pandas._libs.tslibs.timestamps.Timestamp = None, last_time_stamp: pandas._libs.tslibs.timestamps.Timestamp = None) → pandas.core.frame.DataFrame[source]

Fill dataframe with missing timestamps based on a defined frequency.

If timestamps are missing, this function will reindex the dataframe. If timestamps are not missing, then the function will return the dataframe unmodified.

Functional usage example:

import pandas as pd
import janitor.timeseries

df = pd.DataFrame(...)

df = janitor.timeseries.fill_missing_timestamps(
    df=df,
    frequency="1H",
)

Method chaining example:

import pandas as pd
import janitor.timeseries

df = (
    pd.DataFrame(...)
    .fill_missing_timestamps(frequency="1H")
)
Parameters
  • df – Dataframe which needs to be tested for missing timestamps

  • frequency – frequency i.e. sampling frequency of the data. Acceptable frequency strings are available here Check offset aliases under time series in user guide

  • first_time_stamp – timestamp expected to start from Defaults to None. If no input is provided assumes the minimum value in time_series

  • last_time_stamp – timestamp expected to end with. Defaults to None. If no input is provided, assumes the maximum value in time_series

Returns

dataframe that has a complete set of contiguous datetimes.

janitor.timeseries.flag_jumps(df: pandas.core.frame.DataFrame, scale: Union[str, Dict[str, str]] = 'percentage', direction: Union[str, Dict[str, str]] = 'any', threshold: Union[int, float, Dict[str, Union[int, float]]] = 0.0, strict: bool = False) → pandas.core.frame.DataFrame[source]

Create boolean column(s) that flag whether or not the change between consecutive rows exceeds a provided threshold.

Functional usage example:

import pandas as pd
import janitor.timeseries

df = pd.DataFrame(...)

df = flag_jumps(
    df=df,
    scale="absolute",
    direction="any",
    threshold=2,
)

Method chaining example:

import pandas as pd
import janitor.timeseries

df = (
    pd.DatFrame(...)
    .flag_jumps(
        scale="absolute",
        direction="any",
        threshold=2,
    )
)

Detailed chaining examples:

# Applies specified criteria across all columns of the dataframe
# Appends a flag column for each column in the dataframe
df = (
    pd.DataFrame(...)
    .flag_jumps(
        scale="absolute",
        direction="any",
        threshold=2
    )
)

# Applies specific criteria to certain dataframe columns
# Applies default criteria to columns not specifically listed
# Appends a flag column for each column in the dataframe
df = (
    pd.DataFrame(...)
    .flag_jumps(
        scale=dict(col1="absolute", col2="percentage"),
        direction=dict(col1="increasing", col2="any"),
        threshold=dict(col1=1, col2=0.5),
    )
)

# Applies specific criteria to certain dataframe columns
# Applies default criteria to columns not specifically listed
# Appends a flag column for each column in the dataframe
df = (
    pd.DataFrame(...)
    .flag_jumps(
        scale=dict(col1="absolute"),
        direction=dict(col2="increasing"),
    )
)

# Applies specific criteria to certain dataframe columns
# Applies default criteria to columns not specifically listed
# Appends a flag column for only those columns found in
#   specified criteria
df = (
    pd.DataFrame(...)
    .flag_jumps(
        scale=dict(col1="absolute"),
        threshold=dict(col2=1),
        strict=True,
    )
)
Parameters
  • df – Dataframe which needs to be flagged for changes between consecutive rows above a certain threshold.

  • scale

    Type of scaling approach to use. Acceptable arguments are:

    1. absolute (consider the difference between rows).

    2. percentage (consider the percentage change between rows).

    Defaults to percentage.

  • direction

    Type of method used to handle the sign change when comparing consecutive rows. Acceptable arguments are:

    1. increasing (only consider rows that are increasing in value).

    2. decreasing (only consider rows that are decreasing in value).

    3. any (consider rows that are either increasing or decreasing;

      sign is ignored).

    Defaults to any.

  • threshold – The value to check if consecutive row comparisons exceed. Always uses a greater than comparison. Must be >= 0.0. Defaults to 0.0

  • strict

    flag to enable/disable appending of a flag column for each column in the provided dataframe. If set to True, will only append a flag column for those columns

    found in at least one of the input dictionaries.

    If set to False, will append a flag column for each column found

    in the provided dataframe. If criteria is not specified, the defaults for each criteria is used.

    Defaults to False.

Returns

Dataframe that has flag jump columns.

Raises
  • JanitorError – if strict=True and at least one of scale, direction, or threshold inputs is not a dictionary.

  • JanitorError – if scale is not one of ["absolute", "percentage"].

  • JanitorError – if direction is not one of ["increasing", "decreasing", "any"].

  • JanitorError – if threshold is less than 0.0.

janitor.timeseries.sort_timestamps_monotonically(df: pandas.core.frame.DataFrame, direction: str = 'increasing', strict: bool = False) → pandas.core.frame.DataFrame[source]

Sort dataframe such that index is monotonic.

If timestamps are monotonic, this function will return the dataframe unmodified. If timestamps are not monotonic, then the function will sort the dataframe.

Functional usage example:

import pandas as pd
import janitor.timeseries

df = pd.DataFrame(...)

df = janitor.timeseries.sort_timestamps_monotonically(
    direction="increasing"
)

Method chaining example:

import pandas as pd
import janitor.timeseries

df = (
    pd.DataFrame(...)
    .sort_timestamps_monotonically(direction="increasing")
)
Parameters
  • df – Dataframe which needs to be tested for monotonicity

  • direction

    type of monotonicity desired. Acceptable arguments are:

    1. increasing

    2. decreasing

  • strict – flag to enable/disable strict monotonicity. If set to True, will remove duplicates in the index, by retaining first occurrence of value in index. If set to False, will not test for duplicates in the index. Defaults to False.

Returns

Dataframe that has monotonically increasing (or decreasing) timestamps.