Check for a Substring in a DataFrame Column
Below are some of the ways by which check for a substring in a Pandas DataFrame column in Python:
- Using str.contains() method
- Using Regular Expressions
- apply() function
- List Comprehension with ‘in’ Operator
Check For a Substring in a Pandas Dataframe using str.contains() method
In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘an’ is present in each ‘Name’ entry using the str.contains
method.
Python3
import pandas as pd data = { 'EmployeeID' : [ 101 , 102 , 103 , 104 ], 'Name' : [ 'Aman' , 'Bhavna' , 'Madhav' , 'Rohan' ], 'Department' : [ 'HR' , 'IT' , 'Finance' , 'Marketing' ], 'Salary' : [ 60000 , 75000 , 90000 , 65000 ] } df = pd.DataFrame(data) # Checking for substring 'an' in the 'Name' column substring = 'an' df[ 'NameContainsSubstring' ] = df[ 'Name' ]. str .contains(substring) filtered_df = df[df[ 'NameContainsSubstring' ]] print (filtered_df) |
Output:
EmployeeID Name Department Salary NameContainsSubstring
0 101 Aman HR 60000 True
3 104 Rohan Marketing 65000 True
Check For A Substring In A Pandas Dataframe Using Regular Expressions
In this example, a pandas DataFrame is created with employee information. A new column, ‘NameContainsPattern,’ is added, indicating whether the regular expression pattern ‘ma’ is present in each ‘Name’ entry.
In this example, the str.contains
method is used with the regex=True
parameter to interpret the pattern as a regular expression. The negative lookahead ensures that ‘ma’ is not immediately followed by the end of the string.
Python3
import pandas as pd data = { 'EmployeeID' : [ 101 , 102 , 103 , 104 ], 'Name' : [ 'aman' , 'bhavna' , 'madhav' , 'rohan' ], 'Department' : [ 'HR' , 'IT' , 'Finance' , 'Marketing' ], 'Salary' : [ 60000 , 75000 , 90000 , 65000 ] } df = pd.DataFrame(data) # regular expression pattern with negative lookahead pattern = r 'ma(?!$)' df[ 'NameContainsPattern' ] = df[ 'Name' ]. str .contains(pattern, regex = True ) filtered_df = df[df[ 'NameContainsPattern' ]] print (filtered_df) |
Output:
EmployeeID Name Department Salary NameContainsPattern
0 101 aman HR 60000 True
2 103 madhav Finance 90000 True
Check For A Substring In A Pandas Dataframe Using apply() function
In this example, a pandas DataFrame is created with employee information, including ‘EmployeeID’, ‘Name’, ‘Department’, and ‘Salary’. A new column, ‘NameContainsSubstring,’ is added, indicating whether the substring ‘av’ is present in each ‘Name’ entry using the apply() method with a lambda function.
Python3
import pandas as pd # Creating a relevant 4-column DataFrame data = { 'EmployeeID' : [ 101 , 102 , 103 , 104 ], 'Name' : [ 'Aman' , 'Bhavna' , 'Madhav' , 'Rohan' ], 'Department' : [ 'HR' , 'IT' , 'Finance' , 'Marketing' ], 'Salary' : [ 60000 , 75000 , 90000 , 65000 ] } df = pd.DataFrame(data) # Checking for substring 'av' in the 'Name' column and adding a new column substring = 'av' df[ 'NameContainsSubstring' ] = df[ 'Name' ]. apply ( lambda x: substring in x) filtered_df = df[df[ 'NameContainsSubstring' ]] print (filtered_df) |
Output:
EmployeeID Name Department Salary NameContainsSubstring
1 102 Bhavna IT 75000 True
2 103 Madhav Finance 90000 True
Check For A Substring In A Pandas Dataframe Using List Comprehension with ‘in’ Operator
In this example, let’s check whether the substring is present in each department key using list comprehension.
Python3
import pandas as pd data = { 'EmployeeID' : [ 101 , 102 , 103 , 104 ], 'Name' : [ 'Aman' , 'Bhavna' , 'Madhav' , 'Rohan' ], 'Department' : [ 'HR' , 'IT' , 'Finance' , 'Marketing' ], 'Salary' : [ 60000 , 75000 , 90000 , 65000 ] } df = pd.DataFrame(data) # Checking for substring substring = 'Finance' df[ 'NameContainsSubstring' ] = [substring in Department for Department in df[ 'Department' ]] filtered_df = df[df[ 'NameContainsSubstring' ]] print (filtered_df) |
Output:
EmployeeID Name Department Salary NameContainsSubstring
2 103 Madhav Finance 90000 True
Check For A Substring In A Pandas Dataframe Column
Pandas is a data analysis library for Python that has exploded in popularity over the past years. In technical terms, pandas is an in memory nosql database, that has sql-like constructs, basic statistical and analytic support, as well as graphing capability .One common task in data analysis is searching for substrings within a dataset, and Pandas offers efficient tools to accomplish this.