You can use the following methods to find the minimum date (i.e. the earliest date) in a column of a PySpark DataFrame:

**Method 1: Find Minimum Date in One Column**

from pyspark.sql import functions as F #find minimum date in sales_date column df.select(F.min('sales_date').alias('min_date')).show()

**Method 2: Find Minimum Date in One Column, Grouped by Another Column**

from pyspark.sql import functions as F #find minimum date in sales_date column, grouped by employee column df.groupBy('employee').agg(F.min('sales_date').alias('min_date')).show()

The following examples show how to use each method in practice with the following PySpark DataFrame that contains information about sales made by various employees at some company:

**from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
#define data
data = [['A', '2020-10-25', 15],
['A', '2013-10-11', 24],
['A', '2015-10-17', 31],
['B', '2022-12-21', 27],
['B', '2021-04-14', 40],
['B', '2021-06-26', 34]]
#define column names
columns = ['employee', 'sales_date', 'total_sales']
#create dataframe using data and column names
df = spark.createDataFrame(data, columns)
#view dataframe
df.show()
+--------+----------+-----------+
|employee|sales_date|total_sales|
+--------+----------+-----------+
| A|2020-10-25| 15|
| A|2013-10-11| 24|
| A|2015-10-17| 31|
| B|2022-12-21| 27|
| B|2021-04-14| 40|
| B|2021-06-26| 34|
+--------+----------+-----------+
**

**Example 1: Find Minimum Date in One Column**

We can use the following syntax to find the minimum date (i.e. the earliest date) in the **sales_date** column:

from pyspark.sql import functions as F #find minimum date in sales_date column df.select(F.min('sales_date').alias('min_date')).show() +----------+ | min_date| +----------+ |2013-10-11| +----------+

We can see that the minimum date in the **sales_date** column is **2013-10-11**.

**Note**: We used the **alias** function to rename the column to **min_date** in the resulting DataFrame.

**Example 2: Find Minimum Date in One Column, Grouped by Another Column**

We can use the following syntax to find the minimum date in the **sales_date** column, grouped by the values in the **employee** column:

from pyspark.sql import functions as F #find minimum date in sales_date column, grouped by employee column df.groupBy('employee').agg(F.min('sales_date').alias('min_date')).show() +--------+----------+ |employee| min_date| +--------+----------+ | A|2013-10-11| | B|2021-04-14| +--------+----------+

The resulting DataFrame shows the minimum sales date (i.e. earliest date) for each unique employee in the DataFrame.

**Additional Resources**

The following tutorials explain how to perform other common tasks in PySpark:

How to Add Days to a Date Column in PySpark

How to Convert String to Date in PySpark

How to Convert Timestamp to Date in PySpark