How to Replicate Rows in a Pandas DataFrame


You can use the following basic syntax to replicate each row in a pandas DataFrame a certain number of times:

#replicate each row 3 times
df_new = pd.DataFrame(np.repeat(df.values, 3, axis=0))

The number in the second argument of the NumPy repeat() function specifies the number of times to replicate each row.

The following example shows how to use this syntax in practice.

Example: Replicate Rows in a Pandas DataFrame

Suppose we have the following pandas DataFrame that contains information about various basketball players:

import pandas as pd

#create dataFrame
df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'points': [18, 20, 19, 14, 14, 11],
                   'assists': [5, 7, 7, 9, 12, 5],
                   'rebounds': [11, 8, 10, 6, 6, 5]})

#view DataFrame
print(df)

  team  points  assists  rebounds
0    A      18        5        11
1    B      20        7         8
2    C      19        7        10
3    D      14        9         6
4    E      14       12         6
5    F      11        5         5

We can use the following syntax to replicate each row in the DataFrame three times:

import numpy as np

#define new DataFrame as original DataFrame with each row repeated 3 times
df_new = pd.DataFrame(np.repeat(df.values, 3, axis=0))

#assign column names of original DataFrame to new DataFrame
df_new.columns = df.columns

#view new DataFrame
print(df_new)

   team points assists rebounds
0     A     18       5       11
1     A     18       5       11
2     A     18       5       11
3     B     20       7        8
4     B     20       7        8
5     B     20       7        8
6     C     19       7       10
7     C     19       7       10
8     C     19       7       10
9     D     14       9        6
10    D     14       9        6
11    D     14       9        6
12    E     14      12        6
13    E     14      12        6
14    E     14      12        6
15    F     11       5        5
16    F     11       5        5
17    F     11       5        5

The new DataFrame contains each of the rows from the original DataFrame, replicated three times each.

Notice that the index values have also been reset.

The values in the index now range from 0 to 17.

Note: You can find the complete documentation for the NumPy repeat() function here.

Additional Resources

The following tutorials explain how to perform other common tasks in pandas:

Pandas: How to Find the Difference Between Two Columns
Pandas: How to Find the Difference Between Two Rows
Pandas: How to Sort Columns by Name

Leave a Reply

Your email address will not be published. Required fields are marked *