When merging two datasets in SAS, you can use the IN statement to only return rows where a value exists in a particular dataset.
Here are a few common ways to use the IN statement in practice:
Method 1: Return Rows where Value Exists in First Dataset (in = a)
data final_data;
merge data1 (in=a) data2;
by ID;
if a;
run;
This particular example merges the datasets called data1 and data2 and only returns the rows where a value exists in data1.
Method 2: Return Rows where Value Exists in Second Dataset (in = b)
data final_data;
merge data1 data2 (in=b);
by ID;
if b;
run;
This particular example merges the datasets called data1 and data2 and only returns the rows where a value exists in data2.
Method 3: Return Rows where Value Exists in Both Datasets (in = a) and (in = b)
data final_data;
merge data1 (in = a) data2 (in=b);
by ID;
if a and b;
run;
This particular example merges the datasets called data1 and data2 and only returns the rows where a value exists in both data1 and data2.
The following examples show how to use each method in practice with the following two datasets:
/*create first dataset*/
data data1;
input ID Gender $;
datalines;
1 Male
2 Male
3 Female
4 Male
5 Female
;
run;
title "data1";
proc print data = data1;
/*create second dataset*/
data data2;
input ID Sales;
datalines;
1 22
2 15
4 29
6 31
7 20
8 13
;
run;
title "data2";
proc print data = data2;
Example 1: Return All Rows
We can use the following merge statement without any IN statement to merge the two datasets based on the value in the ID column and return all rows from both datasets:
/*perform merge*/
data final_data;
merge data1 data2;
by ID;
run;
/*view results*/
title "final_data";
proc print data=final_data;
Notice that all rows from both datasets are returned, regardless if there are missing values due to an ID value not existing in both datasets.
Example 2: Return Rows where Value Exists in First Dataset (in = a)
We can use the following merge statement with (in = a) to merge the two datasets based on the value in the ID column and return only the rows where a value exists in the first dataset:
/*perform merge*/
data final_data;
merge data1 (in = a) data2;
by ID;
if a;
run;
/*view results*/
title "final_data";
proc print data=final_data;
Notice that only the rows where a value exists in the first dataset are returned.
Example 3: Return Rows where Value Exists in Second Dataset (in = b)
We can use the following merge statement with (in = b) to merge the two datasets based on the value in the ID column and return only the rows where a value exists in the second dataset:
/*perform merge*/
data final_data;
merge data1 data2 (in = b);
by ID;
if b;
run;
/*view results*/
title "final_data";
proc print data=final_data;
Notice that only the rows where a value exists in the second dataset are returned.
Example 4: Return Rows where Value Exists in Both Datasets (in = a) and (in = b)
We can use the following merge statement with (in = a) and (in = b) to merge the two datasets based on the value in the ID column and return only the rows where a value exists in both datasets:
/*perform merge*/
data final_data;
merge data1 (in = a) data2 (in = b);
by ID;
if a and b;
run;
/*view results*/
title "final_data";
proc print data=final_data;
Notice that only the rows where a value exists in both datasets are returned.
Note: You can find the complete documentation for the SAS merge statement here.
Additional Resources
The following tutorials explain how to perform other common tasks in SAS:
How to Perform a Left Join in SAS
How to Perform an Inner Join in SAS
How to Perform an Outer Join in SAS