You can use the following syntax to filter a PySpark DataFrame using a NOT LIKE operator:
df.filter(~df.team.like('%avs%')).show()
This particular example filters the DataFrame to only show rows where the string in the team column does not have a pattern like “avs” somewhere in the string.
The following example shows how to use this syntax in practice.
Example: How to Filter Using NOT LIKE in PySpark
Suppose we have the following PySpark DataFrame that contains information about points scored by various basketball players:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['Mavs', 18],
['Nets', 33],
['Lakers', 12],
['Mavs', 15],
['Cavs', 19],
['Wizards', 24],
['Cavs', 28],
['Nets', 40],
['Mavs', 24],
['Spurs', 13]]
#define column names
columns = ['team', 'points']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+-------+------+
| team|points|
+-------+------+
| Mavs| 18|
| Nets| 33|
| Lakers| 12|
| Mavs| 15|
| Cavs| 19|
|Wizards| 24|
| Cavs| 28|
| Nets| 40|
| Mavs| 24|
| Spurs| 13|
+-------+------+
We can use the following syntax to filter the DataFrame to only contain rows where the team column does not contain a pattern like “avs” somewhere in the string:
#filter DataFrame where team column does not contain pattern like 'avs' df.filter(~df.team.like('%avs%')).show() +-------+------+ | team|points| +-------+------+ | Nets| 33| | Lakers| 12| |Wizards| 24| | Nets| 40| | Spurs| 13| +-------+------+
Notice that each of the rows in the resulting DataFrame do not contain a pattern like “avs” in the team column.
Note that we used the like function to find all strings in the team column that had a pattern like “avs” and then we used the ~ symbol to negate this function.
The end result is that we’re able to filter for only the rows in the DataFrame that do not have a pattern like “avs” in the team column.
Note: You can find the complete documentation for the PySpark like function here.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Use “OR” Operator
PySpark: How to Use “AND” Operator
PySpark: How to Use “NOT IN” Operator