janitor.filter_string¶
-
janitor.
filter_string
(df: pandas.core.frame.DataFrame, column_name: Hashable, search_string: str, complement: bool = False) → pandas.core.frame.DataFrame[source]¶ Filter a string-based column according to whether it contains a substring.
This is super sugary syntax that builds on top of pandas.Series.str.contains.
Because this uses internally pandas.Series.str.contains, which allows a regex string to be passed into it, thus search_string can also be a regex pattern.
This method does not mutate the original DataFrame.
This function allows us to method chain filtering operations:
df = (pd.DataFrame(...) .filter_string('column', search_string='pattern', complement=False) ...) # chain on more data preprocessing.
This stands in contrast to the in-place syntax that is usually used:
df = pd.DataFrame(...) df = df[df['column'].str.contains('pattern')]]
As can be seen here, the API design allows for a more seamless flow in expressing the filtering operations.
Functional usage syntax:
df = filter_string(df, column_name='column', search_string='pattern', complement=False)
Method chaining syntax:
df = (pd.DataFrame(...) .filter_string(column_name='column', search_string='pattern', complement=False) ...)
- Parameters
df – A pandas DataFrame.
column_name – The column to filter. The column should contain strings.
search_string – A regex pattern or a (sub-)string to search.
complement – Whether to return the complement of the filter or not.
- Returns
A filtered pandas DataFrame.