How to Perform Fuzzy Matching in SAS (With Example)


Often you may want to join together two datasets in SAS based on imperfectly matching strings.

This is often called fuzzy matching.

The easiest way to perform fuzzy matching in SAS is to use the SOUNDEX function along with the COMPGED function.

Both of these functions are used to quantify the similarity between strings and can be used to “match” similar strings together.

The following example shows how to use these functions to perform fuzzy matching in SAS.

Example: How to Perform Fuzzy Matching in SAS

Suppose we have the following dataset in SAS that contains information about team names and points for various basketball players:

/*create first dataset*/     
data data1;
  input team $ points;
  datalines;
Mavs 19
Nets 22
Kings 34
Warriors 19
Magic 32   
;
run;
/*view dataset*/
proc print data=data1;

And suppose we have another dataset with team names and assists for various basketball players:

/*create second dataset*/     
data data2;
  input team $ assists;
  datalines;
Netts 8
Majick 7
Keengs 8
Warriors 12
Mavs 4    
;
run;
/*view dataset*/
proc print data=data2;

Notice that many of the team names in this dataset are similar but not exactly the same as the team names in the previous dataset.

We can use the following syntax in SAS to perform fuzzy matching and join together these two datasets based on similar team names:

/*use fuzzy matching to merge datasets based on similar team names*/
data data3;                                       
  set data1;
  tmp1=soundex(team);       /*encode team names from data1*/
  do i=1 to nobs;     
    set data2(rename=(team=team2)) point=i nobs=nobs;        
    tmp2=soundex(team2);    /*encode team names from data2*/
    dif=compged(tmp1,tmp2); /*determine similarity between team names*/
    if dif<=50 then do;
      drop i tmp1 tmp2 dif; /*drop unnecessary variables*/
      output;
    end;
  end;
run;

/*view resulting dataset*/
proc print data=data3;

fuzzy matching example in SAS

The SOUNDEX and COMPGED functions are able to match team names based on similarity and produce one final dataset that merges the two datasets together.

Additional Resources

The following tutorials explain how to perform other common tasks in SAS:

SAS: How to Perform One-to-Many Merge
SAS: How to Use (in=a) in Merge Statement
SAS: How to Merge If A Not B

Leave a Reply

Your email address will not be published. Required fields are marked *