You can use the following basic syntax to merge two datasets in SAS and only return the rows where a value exists in the first dataset and not the second dataset:
data final_data;
merge data1 (in = a) data2 (in = b);
by ID;
if a and not b;
run;
This particular example merges the datasets called data1 and data2 and only returns the rows where a value exists in data1 and not in data2.
The following example shows how to use this syntax in practice.
Example: Merge Two Datasets in SAS Using If A Not B
Suppose we have the following two datasets in SAS that contain information about sales associates at some company:
/*create first dataset*/
data data1;
input ID Gender $;
datalines;
1 Male
2 Male
3 Female
4 Male
5 Female
;
run;
title "data1";
proc print data = data1;
/*create second dataset*/
data data2;
input ID Sales;
datalines;
1 22
2 15
4 29
6 31
7 20
8 13
;
run;
title "data2";
proc print data = data2;
If we use the following merge statement to merge the two datasets based on the value in the ID column, all rows will be returned from both datasets:
/*perform merge*/
data final_data;
merge data1 data2;
by ID;
run;
/*view results*/
title "final_data";
proc print data=final_data;
However, we can use IN statements to only return the rows where a value exists in the first dataset and not in the second dataset:
data final_data;
merge data1 (in = a) data2 (in = b);
by ID;
if a and not b;
run;
/*view results*/
title "final_data";
proc print data=final_data;
Notice that only the rows where a value exists in the first dataset and not the second dataset are returned.
Note: You can find the complete documentation for the SAS merge statement here.
Additional Resources
The following tutorials explain how to perform other common tasks in SAS:
How to Perform a Left Join in SAS
How to Perform an Inner Join in SAS
How to Perform an Outer Join in SAS