SAS: How to Merge Datasets Based on Two Variables


You can use the following basic syntax to merge two datasets in SAS based on two variables matching:

data final_data;
  merge data1 (in = a) data2 (in = b);
  by ID Store;
  if a and b;
run;

This particular example merges the datasets called data1 and data2 based on the variables called ID and Store and only returns the rows where a value exists in both datasets.

The following example shows how to use this syntax in practice.

Example: Merge Datasets in SAS Based on Two Variables

Suppose we have the following dataset in SAS that contains information about sales associates at some company:

/*create first dataset*/
data data1;
    input ID Store $;
    datalines;
1 A
1 B
1 C
2 A
2 C
3 A
3 B
;
run;

/*view first dataset*/
title "data1";
proc print data = data1;

And suppose we have another dataset that contains information about the sales made at various stores by each associate:

/*create second dataset*/
data data2;
    input ID Store $ Sales;
    datalines;
1 A 22
1 B 25
2 A 40
2 B 24
2 C 29
3 A 12
3 B 15
;
run;

/*view second dataset*/
title "data2";
proc print data = data2;

We can use the following merge statement to merge the two datasets based on matching values in the ID and Store columns, then only return rows where a value exists in both columns:

/*perform merge*/
data final_data;
  merge data1 (in = a) data2 (in = b);
  by ID Store;
  if a and b;
run;

/*view results*/
title "final_data";
proc print data=final_data;

The resulting dataset returns the rows where the values in the ID and Store columns both match.

Note: You can find the complete documentation for the SAS merge statement here.

Additional Resources

The following tutorials explain how to perform other common tasks in SAS:

SAS: How to Perform One-to-Many Merge
SAS: How to Use (in=a) in Merge Statement
SAS: How to Merge If A Not B

Leave a Reply

Your email address will not be published. Required fields are marked *