How to Shuffle Rows in a Pandas DataFrame


You can use the following syntax to randomly shuffle the rows in a pandas DataFrame:

#shuffle entire DataFrame
df.sample(frac=1)

#shuffle entire DataFrame and reset index
df.sample(frac=1).reset_index(drop=True)

Here’s what each piece of the code does:

  • The sample() function takes a sample of all rows without replacement.
  • The frac argument specifies the fraction of rows to return in the sample. A frac value of 1 specifies to use all rows.
  • The reset_index(drop=True) function specifies to reset the index of the rows.

The following examples show how to use this syntax in practice.

Example 1: Shuffle Entire DataFrame

The following code shows how to shuffle all rows in a pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'points': [77, 82, 86, 88, 80, 95],
                   'rebounds': [19, 22, 15, 28, 33, 29]})

#view DataFrame
df

	team	points	rebounds
0	A	77	19
1	A	82	22
2	A	86	15
3	B	88	28
4	B	80	33
5	C	95	29

#shuffle all rows of DataFrame
df.sample(frac=1)

	team	points	rebounds
1	A	82	22
3	B	88	28
2	A	86	15
5	C	95	29
4	B	80	33
0	A	77	19

Notice that the rows are shuffled and each row retained its original index value.

Also note that each time you run this function, the order of the rows will change. 

Example 2: Shuffle Entire DataFrame & Reset Index

The following code shows how to shuffle all rows in a pandas DataFrame and reset the index values:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'B', 'B', 'C'],
                   'points': [77, 82, 86, 88, 80, 95],
                   'rebounds': [19, 22, 15, 28, 33, 29]})

#view DataFrame
df

	team	points	rebounds
0	A	77	19
1	A	82	22
2	A	86	15
3	B	88	28
4	B	80	33
5	C	95	29

#shuffle all rows of DataFrame
df.sample(frac=1).reset_index(drop=True)

	team	points	rebounds
0	A	77	19
1	C	95	29
2	A	82	22
3	B	88	28
4	A	86	15
5	B	80	33

Notice that the rows are shuffled and the index is also reset so that the first row has an index value of 0, the second row has an index value of 1, and so on.

Additional Resources

How to Change the Order of Columns in Pandas DataFrame
How to Get Row Numbers in a Pandas DataFrame
How to Get First Row of Pandas DataFrame

Leave a Reply

Your email address will not be published.