How to Add New Rows to PySpark DataFrame (With Examples)


You can use the following methods to add new rows to a PySpark DataFrame:

Method 1: Add One New Row to DataFrame

#define new row to add with values 'C', 'Guard' and 14
new_row = spark.createDataFrame([('C', 'Guard', 14)], columns)

#add new row to DataFrame
df_new = df.union(new_row)

Method 2: Add Multiple New Rows to DataFrame

#define multiple new rows to add
new_rows = spark.createDataFrame([('C', 'Guard', 14),
                                  ('C', 'Forward', 32),
                                  ('D', 'Forward', 21)], columns)

#add new rows to DataFrame
df_new = df.union(new_rows)

The following examples show how to use each method in practice with the following PySpark DataFrame:

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

#define data
data = [['A', 'Guard', 11], 
        ['A', 'Guard', 8], 
        ['A', 'Forward', 22], 
        ['A', 'Forward', 22], 
        ['B', 'Guard', 14], 
        ['B', 'Guard', 14],
        ['B', 'Forward', 13],
        ['B', 'Forward', 7]] 
  
#define column names
columns = ['team', 'position', 'points'] 
  
#create dataframe using data and column names
df = spark.createDataFrame(data, columns) 
  
#view dataframe
df.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
|   A|   Guard|    11|
|   A|   Guard|     8|
|   A| Forward|    22|
|   A| Forward|    22|
|   B|   Guard|    14|
|   B|   Guard|    14|
|   B| Forward|    13|
|   B| Forward|     7|
+----+--------+------+

Example 1: Add One New Row to DataFrame

We can use the following syntax to add one new row to the end of the existing DataFrame:

#define new row to add
new_row = spark.createDataFrame([('C', 'Guard', 14)], columns)

#add new row to DataFrame
df_new = df.union(new_row)

#view updated DataFrame
df_new.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
|   A|   Guard|    11|
|   A|   Guard|     8|
|   A| Forward|    22|
|   A| Forward|    22|
|   B|   Guard|    14|
|   B|   Guard|    14|
|   B| Forward|    13|
|   B| Forward|     7|
|   C|   Guard|    14|
+----+--------+------+

Notice that one new row has been added to the end of the DataFrame with the values C, Guard and 14 just as we specified.

Example 2: Add Multiple New Rows to DataFrame

We can use the following syntax to add three new rows to the end of the existing DataFrame:

#define multiple new rows to add
new_rows = spark.createDataFrame([('C', 'Guard', 14),
                                  ('C', 'Forward', 32),
                                  ('D', 'Forward', 21)], columns)

#add new rows to DataFrame
df_new = df.union(new_rows)

#view updated DataFrame
df_new.show()

+----+--------+------+
|team|position|points|
+----+--------+------+
|   A|   Guard|    11|
|   A|   Guard|     8|
|   A| Forward|    22|
|   A| Forward|    22|
|   B|   Guard|    14|
|   B|   Guard|    14|
|   B| Forward|    13|
|   B| Forward|     7|
|   C|   Guard|    14|
|   C| Forward|    32|
|   D| Forward|    21|
+----+--------+------+

Notice that three new rows have been added to the end of the DataFrame.

Note that we used the union function in these examples to return a new DataFrame that contained the union of the rows in the existing DataFrame and the values for the new row(s) that we specified.

You can find the complete documentation for the PySpark union function here.

Additional Resources

The following tutorials explain how to perform other common tasks in PySpark:

PySpark: How to Add New Column with Constant Value
PySpark: How to Add Column from Another DataFrame
PySpark: How to Print One Column of a DataFrame

Featured Posts

Leave a Reply

Your email address will not be published. Required fields are marked *