Whenever you create a subset of a pandas DataFrame and then modify the subset, the original DataFrame will also be modified.
For this reason, it’s always a good idea to use .copy() when subsetting so that any modifications you make to the subset won’t also be made to the original DataFrame.
The following examples demonstrate how (and why) to make a copy of a pandas DataFrame when subsetting.
Example 1: Subsetting a DataFrame Without Copying
Suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'points': [18, 22, 19, 14, 14, 11, 20, 28], 'assists': [5, 7, 7, 9, 12, 9, 9, 4]}) #view DataFrame print(df) team points assists 0 A 18 5 1 B 22 7 2 C 19 7 3 D 14 9 4 E 14 12 5 F 11 9 6 G 20 9 7 H 28 4
Now suppose we create a subset that contains only the first four rows of the original DataFrame:
#define subsetted DataFrame df_subset = df[0:4] #view subsetted DataFrame print(df_subset) team points assists rebounds 0 A 18 5 11 1 B 22 7 8 2 C 19 7 10 3 D 14 9 6
If we modify one of the values in the subset, the value in the original DataFrame will also be modified:
#change first value in team column
df_subset.team[0] = 'X'
#view subsetted DataFrame
print(df_subset)
team points assists
0 X 18 5
1 B 22 7
2 C 19 7
3 D 14 9
#view original DataFrame
print(df)
team points assists
0 X 18 5
1 B 22 7
2 C 19 7
3 D 14 9
4 E 14 12
5 F 11 9
6 G 20 9
7 H 28 4
Notice that the first value in the team column has been changed from ‘A’ to ‘X’ in both the subsetted DataFrame and the original DataFrame.
This is because we didn’t make a copy of the original DataFrame.
Example 2: Subsetting a DataFrame With Copying
Once again suppose we have the following pandas DataFrame:
import pandas as pd #create DataFrame df = pd.DataFrame({'team': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'], 'points': [18, 22, 19, 14, 14, 11, 20, 28], 'assists': [5, 7, 7, 9, 12, 9, 9, 4]}) #view DataFrame print(df) team points assists 0 A 18 5 1 B 22 7 2 C 19 7 3 D 14 9 4 E 14 12 5 F 11 9 6 G 20 9 7 H 28 4
Once again suppose we create a subset that contains only the first four rows of the original DataFrame, but this time we use .copy() to make a copy of the original DataFrame:
#define subsetted DataFrame df_subset = df[0:4].copy()
Now suppose we change the first value in the team column of the subsetted DataFrame:
#change first value in team column
df_subset.team[0] = 'X'
#view subsetted DataFrame
print(df_subset)
team points assists
0 X 18 5
1 B 22 7
2 C 19 7
3 D 14 9
#view original DataFrame
print(df)
team points assists
0 A 18 5
1 B 22 7
2 C 19 7
3 D 14 9
4 E 14 12
5 F 11 9
6 G 20 9
7 H 28 4
Notice that the first value in the team column has been changed from ‘A’ to ‘X’ only in the subsetted DataFrame.
The original DataFrame remains untouched since we used .copy() to make a copy of it when creating the subset.
Additional Resources
The following tutorials explain how to perform other common operations in pandas:
How to Drop Rows in Pandas DataFrame Based on Condition
How to Filter a Pandas DataFrame on Multiple Conditions
How to Use “NOT IN” Filter in Pandas DataFrame