You can use the following basic syntax to replicate each row in a pandas DataFrame a certain number of times:
#replicate each row 3 times df_new = pd.DataFrame(np.repeat(df.values, 3, axis=0))
The number in the second argument of the NumPy repeat() function specifies the number of times to replicate each row.
The following example shows how to use this syntax in practice.
Example: Replicate Rows in a Pandas DataFrame
Suppose we have the following pandas DataFrame that contains information about various basketball players:
import pandas as pd #create dataFrame df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F'], 'points': [18, 20, 19, 14, 14, 11], 'assists': [5, 7, 7, 9, 12, 5], 'rebounds': [11, 8, 10, 6, 6, 5]}) #view DataFrame print(df) team points assists rebounds 0 A 18 5 11 1 B 20 7 8 2 C 19 7 10 3 D 14 9 6 4 E 14 12 6 5 F 11 5 5
We can use the following syntax to replicate each row in the DataFrame three times:
import numpy as np #define new DataFrame as original DataFrame with each row repeated 3 times df_new = pd.DataFrame(np.repeat(df.values, 3, axis=0)) #assign column names of original DataFrame to new DataFrame df_new.columns = df.columns #view new DataFrame print(df_new) team points assists rebounds 0 A 18 5 11 1 A 18 5 11 2 A 18 5 11 3 B 20 7 8 4 B 20 7 8 5 B 20 7 8 6 C 19 7 10 7 C 19 7 10 8 C 19 7 10 9 D 14 9 6 10 D 14 9 6 11 D 14 9 6 12 E 14 12 6 13 E 14 12 6 14 E 14 12 6 15 F 11 5 5 16 F 11 5 5 17 F 11 5 5
The new DataFrame contains each of the rows from the original DataFrame, replicated three times each.
Notice that the index values have also been reset.
The values in the index now range from 0 to 17.
Note: You can find the complete documentation for the NumPy repeat() function here.
Additional Resources
The following tutorials explain how to perform other common tasks in pandas:
Pandas: How to Find the Difference Between Two Columns
Pandas: How to Find the Difference Between Two Rows
Pandas: How to Sort Columns by Name