How to Select Rows by Index in a Pandas DataFrame


Often you may want to select the rows of a pandas DataFrame based on their index value.

If you’d like to select rows based on integer indexing, you can use the .iloc function.

If you’d like to select rows based on label indexing, you can use the .loc function.

This tutorial provides an example of how to use each of these functions in practice.

Example 1: Select Rows Based on Integer Indexing

The following code shows how to create a pandas DataFrame and use .iloc to select the row with an index integer value of 3:

import pandas as pd
import numpy as np

#make this example reproducible
np.random.seed(0)

#create DataFrame
df = pd.DataFrame(np.random.rand(6,2), index=range(0,18,3), columns=['A', 'B'])

#view DataFrame
df

	       A	       B
0	0.548814	0.715189
3	0.602763	0.544883
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442
15	0.791725	0.528895

#select the 5th row of the DataFrame
df.iloc[[4]]

	       A	       B
12	0.963663	0.383442

We can use similar syntax to select multiple rows:

#select the 3rd, 4th, and 5th rows of the DataFrame
df.iloc[[2, 3, 4]]

	       A	       B
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442

Or we could select all rows in a range:

#select the 3rd, 4th, and 5th rows of the DataFrame
df.iloc[2:5]

	       A	       B
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442

Example 2: Select Rows Based on Label Indexing

The following code shows how to create a pandas DataFrame and use .loc to select the row with an index label of 3:

import pandas as pd
import numpy as np

#make this example reproducible
np.random.seed(0)

#create DataFrame
df = pd.DataFrame(np.random.rand(6,2), index=range(0,18,3), columns=['A', 'B'])

#view DataFrame
df

	       A	       B
0	0.548814	0.715189
3	0.602763	0.544883
6	0.423655	0.645894
9	0.437587	0.891773
12	0.963663	0.383442
15	0.791725	0.528895

#select the row with index label '3'
df.loc[[3]]

               A	       B
3	0.602763	0.544883

We can use similar syntax to select multiple rows with different index labels:

#select the rows with index labels '3', '6', and '9'
df.loc[[3, 6, 9]]

	       A	       B
3	0.602763	0.544883
6	0.423655	0.645894
9	0.437587	0.891773

The Difference Between .iloc and .loc

The examples above illustrate the subtle difference between .iloc an .loc:

  • .iloc selects rows based on an integer index. So, if you want to select the 5th row in a DataFrame, you would use df.iloc[[4]] since the first row is at index 0, the second row is at index 1, and so on.
  • .loc selects rows based on a labeled index. So, if you want to select the row with an index label of 5, you would directly use df.loc[[5]].

Additional Resources

How to Get Row Numbers in a Pandas DataFrame
How to Drop Rows with NaN Values in Pandas
How to Drop the Index Column in Pandas

Leave a Reply

Your email address will not be published. Required fields are marked *