How to Randomly Sample Rows in Pandas (With Examples)


You can use the following basic syntax to randomly sample rows from a pandas DataFrame:

#randomly select one row
df.sample()

#randomly select n rows
df.sample(n=5)

#randomly select n rows with repeats allowed
df.sample(n=5, replace=True) 

#randomly select a fraction of the total rows
df.sample(frac=0.3)

#randomly select n rows by group
df.groupby('team', group_keys=False).apply(lambda x: x.sample(2))

The following examples show how to use this syntax in practice with the following pandas DataFrame:

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'points': [25, 12, 15, 14, 19, 23, 25, 29],
                   'assists': [5, 7, 7, 9, 12, 9, 9, 4],
                   'rebounds': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame 
df

        team	points	assists	 rebounds
0	A	25	5	 11
1	A	12	7	 8
2	A	15	7	 10
3	A	14	9	 6
4	B	19	12	 6
5	B	23	9	 5
6	B	25	9	 9
7	B	29	4	 12

Example 1: Randomly Select One Row

The following code shows how to randomly select one row from the DataFrame:

#randomly select one row
df.sample()

        team	points	assists	rebounds
5	B	23	9	5

Example 2: Randomly Select n Rows

The following code shows how to randomly select n rows from the DataFrame:

#randomly select n rows
df.sample(n=5)

        team	points	assists	rebounds
5	B	23	9	5
2	A	15	7	10
4	B	19	12	6
6	B	25	9	9
1	A	12	7	8

Example 3: Randomly Select n Rows with Repeats Allowed

The following code shows how to randomly select n rows from the DataFrame, with repeat rows allowed:

#randomly select 5 rows with repeats allowed
df.sample(n=5, replace=True) 

	team	points	assists	rebounds
6	B	25	9	9
7	B	29	4	12
5	B	23	9	5
1	A	12	7	8
5	B	23	9	5

Example 4: Randomly Select A Fraction of the Total Rows

The following code shows how to randomly select a fraction of the total rows from the DataFrame

#randomly select 25% of rows
df.sample(frac=0.25) 

	team	points	assists	rebounds
2	A	15	7	10
1	A	12	7	8

Example 5: Randomly Select n Rows by Group

The following code shows how to randomly select n rows by group from the DataFrame

#randomly select 2 rows from each team
df.groupby('team', group_keys=False).apply(lambda x: x.sample(2))

        team	points	assists	rebounds
0	A	25	5	11
2	A	15	7	10
7	B	29	4	12
4	B	19	12	6

Notice that 2 rows from team ‘A’ and 2 rows from team ‘B’ were randomly sampled.

Note: You can find the complete documentation for the pandas sample() function here.

Additional Resources

The following tutorials explain how to perform other common sampling methods in Pandas:

How to Perform Stratified Sampling in Pandas
How to Perform Cluster Sampling in Pandas
How to Perform Stratified Sampling in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *