You can use the WHERE option with SET in SAS to create a new dataset that only includes rows from another dataset where certain conditions are met.
Here are two common ways to use this option in practice:
Method 1: Use WHERE and SET with One Condition
data new_data;
set my_data (where = (points>20));
run;
This example creates a new dataset called new_data that only includes rows from my_data where the value in the points column is greater than 20.
Method 2: Use WHERE and SET with Multiple Conditions
data new_data;
set my_data (where = (points>20 or team="Rockets"));
run;
This example creates a new dataset called new_data that only includes rows from my_data where the value in the points column is greater than 20 or the value in the team column is equal to Rockets.
The following examples show how to use each method in practice with the following dataset in SAS:
/*create dataset*/
data my_data;
input team $ points assists;
datalines;
Mavs 22 10
Rockets 12 14
Spurs 29 8
Kings 13 10
Warriors 44 10
Heat 18 8
Magic 11 5
Pelicans 19 3
Blazers 12 8
;
run;
/*view dataset*/
proc print data=my_data;
Example 1: Use WHERE and SET with One Condition
We can use the following syntax to create a new dataset called new_data that only includes rows from my_data where the value in the points column is greater than 20:
/*create new dataset*/
data new_data;
set my_data (where = (points>20));
run;
/*view new dataset*/
proc print data=new_data;
Notice that only the rows with a value greater than 20 in the points column are included in this dataset.
Example 2: Use WHERE and SET with Multiple Conditions
We can use the following syntax to create a new dataset called new_data that only includes rows from my_data where the value in the points column is greater than 20 or the value in the team column is equal to “Rockets”:
/*create new dataset*/
data new_data;
set my_data (where = (points>20 or team="Rockets"));
run;
/*view new dataset*/
proc print data=new_data;
Notice that only the rows with a value greater than 20 in the points column or a value of “Rockets” in the team column are included in this dataset.
Note: You could use and instead of or to only include rows that meet multiple conditions.
Additional Resources
The following tutorials explain how to perform other common tasks in SAS:
How to Use SET Statement with Multiple Datasets in SAS
How to Select the First N Rows of a Dataset in SAS
How to Delete Datasets in SAS