You can use the following methods to add new rows to a PySpark DataFrame:
Method 1: Add One New Row to DataFrame
#define new row to add with values 'C', 'Guard' and 14 new_row = spark.createDataFrame([('C', 'Guard', 14)], columns) #add new row to DataFrame df_new = df.union(new_row)
Method 2: Add Multiple New Rows to DataFrame
#define multiple new rows to add
new_rows = spark.createDataFrame([('C', 'Guard', 14),
('C', 'Forward', 32),
('D', 'Forward', 21)], columns)
#add new rows to DataFrame
df_new = df.union(new_rows)
The following examples show how to use each method in practice with the following PySpark DataFrame:
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() #define data data = [['A', 'Guard', 11], ['A', 'Guard', 8], ['A', 'Forward', 22], ['A', 'Forward', 22], ['B', 'Guard', 14], ['B', 'Guard', 14], ['B', 'Forward', 13], ['B', 'Forward', 7]] #define column names columns = ['team', 'position', 'points'] #create dataframe using data and column names df = spark.createDataFrame(data, columns) #view dataframe df.show() +----+--------+------+ |team|position|points| +----+--------+------+ | A| Guard| 11| | A| Guard| 8| | A| Forward| 22| | A| Forward| 22| | B| Guard| 14| | B| Guard| 14| | B| Forward| 13| | B| Forward| 7| +----+--------+------+
Example 1: Add One New Row to DataFrame
We can use the following syntax to add one new row to the end of the existing DataFrame:
#define new row to add new_row = spark.createDataFrame([('C', 'Guard', 14)], columns) #add new row to DataFrame df_new = df.union(new_row) #view updated DataFrame df_new.show() +----+--------+------+ |team|position|points| +----+--------+------+ | A| Guard| 11| | A| Guard| 8| | A| Forward| 22| | A| Forward| 22| | B| Guard| 14| | B| Guard| 14| | B| Forward| 13| | B| Forward| 7| | C| Guard| 14| +----+--------+------+
Notice that one new row has been added to the end of the DataFrame with the values C, Guard and 14 just as we specified.
Example 2: Add Multiple New Rows to DataFrame
We can use the following syntax to add three new rows to the end of the existing DataFrame:
#define multiple new rows to add
new_rows = spark.createDataFrame([('C', 'Guard', 14),
('C', 'Forward', 32),
('D', 'Forward', 21)], columns)
#add new rows to DataFrame
df_new = df.union(new_rows)
#view updated DataFrame
df_new.show()
+----+--------+------+
|team|position|points|
+----+--------+------+
| A| Guard| 11|
| A| Guard| 8|
| A| Forward| 22|
| A| Forward| 22|
| B| Guard| 14|
| B| Guard| 14|
| B| Forward| 13|
| B| Forward| 7|
| C| Guard| 14|
| C| Forward| 32|
| D| Forward| 21|
+----+--------+------+
Notice that three new rows have been added to the end of the DataFrame.
Note that we used the union function in these examples to return a new DataFrame that contained the union of the rows in the existing DataFrame and the values for the new row(s) that we specified.
You can find the complete documentation for the PySpark union function here.
Additional Resources
The following tutorials explain how to perform other common tasks in PySpark:
PySpark: How to Add New Column with Constant Value
PySpark: How to Add Column from Another DataFrame
PySpark: How to Print One Column of a DataFrame