Often you might be interested in sorting a pandas DataFrame by one or more columns. This tutorial explains how to do so using the pandas sort_values function.
How to Sort pandas DataFrames
Suppose we have the following pandas DataFrame with 10 rows and 4 columns:
import pandas as pd import numpy as np #create DataFrame df = pd.DataFrame({'player': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'], 'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19], 'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5], 'rebounds': [np.nan, 8, 10, 6, 6, 9, 6, 10, 10, 7]}) df player points assists rebounds 0 A 25 5 NaN 1 B 20 7 8 2 C 14 7 10 3 D 16 8 6 4 E 27 5 6 5 F 20 7 9 6 G 12 6 6 7 H 15 9 10 8 I 14 9 10 9 J 19 5 7
Example 1: Sort by one column ascending.
The following code snippet shows how to sort the DataFrame by one column in ascending order:
#sort by assists from smallest to largest df.sort_values(by=['assists']) player points assists rebounds 0 A 25 5 NaN 4 E 27 5 6.0 9 J 19 5 7.0 6 G 12 6 6.0 1 B 20 7 8.0 2 C 14 7 10.0 5 F 20 7 9.0 3 D 16 8 6.0 7 H 15 9 10.0 8 I 14 9 10.0
Example 2: Sort by one column descending.
The following code snippet shows how to sort the DataFrame by one column in ascending order:
#sort by rebounds from largest to smallest df.sort_values(by=['rebounds'], ascending=False) player points assists rebounds 2 C 14 7 10.0 7 H 15 9 10.0 8 I 14 9 10.0 5 F 20 7 9.0 1 B 20 7 8.0 9 J 19 5 7.0 3 D 16 8 6.0 4 E 27 5 6.0 6 G 12 6 6.0 0 A 25 5 NaN
Note that if you sort by a character column, ascending=False will sort the rows in reverse alphabetical order:
#sort by character column df.sort_values(by=['player'], ascending=False) player points assists rebounds 9 J 19 5 7.0 8 I 14 9 10.0 7 H 15 9 10.0 6 G 12 6 6.0 5 F 20 7 9.0 4 E 27 5 6.0 3 D 16 8 6.0 2 C 14 7 10.0 1 B 20 7 8.0 0 A 25 5 NaN
Example 3: Sort by multiple columns.
The following code snippet shows how to sort the DataFrame by multiple columns, each in ascending order:
#sort by assists ascending, then rebounds ascending df.sort_values(by=['assists', 'rebounds']) player points assists rebounds 4 E 27 5 6.0 9 J 19 5 7.0 0 A 25 5 NaN 6 G 12 6 6.0 1 B 20 7 8.0 5 F 20 7 9.0 2 C 14 7 10.0 3 D 16 8 6.0 7 H 15 9 10.0 8 I 14 9 10.0
Note that you can also sort one column ascending and another descending by passing multiple arguments to ascending:
#sort by assists ascending, then rebounds descending df.sort_values(by=['assists', 'rebounds'], ascending=[True, False]) player points assists rebounds 9 J 19 5 7.0 4 E 27 5 6.0 0 A 25 5 NaN 6 G 12 6 6.0 2 C 14 7 10.0 5 F 20 7 9.0 1 B 20 7 8.0 3 D 16 8 6.0 7 H 15 9 10.0 8 I 14 9 10.0
Example 4: Sort NaNs first.
By default, pandas puts NaNs last but you can specify NaNs to be placed first:
#sort by assists ascending, then rebounds descending df.sort_values(by=['rebounds'], na_position='first') player points assists rebounds 0 A 25 5 NaN 3 D 16 8 6.0 4 E 27 5 6.0 6 G 12 6 6.0 9 J 19 5 7.0 1 B 20 7 8.0 5 F 20 7 9.0 2 C 14 7 10.0 7 H 15 9 10.0 8 I 14 9 10.0
You can find the complete documentation for the sort_values function here.