Search code examples
arrayssasdatastep

Replace strings in a column in sas


I want to replace all strings in a column with following:

strings  new_strings
ABC_MNO_S3  S1
ABC_S1      S2
ABC_S2      S3
ABC_PQR_S3  S4
XYZ_MNO_S3  S5
XYZ_S1      S6
XYZ_S2      S7
XYZ_PQR_S3  S8

So whenever any of the above 'string' appear in my column 'states' I want to replace it with 'new_string' dynamically. I tried to put these list in an array and use TRANWRD to search through the column and replace, but not working.

My column states and the desired output looks like following:

states                 states_result
TR_ABC_MNO_S3_ABC_S2    TR_S1_S3
TR_ABC_S1_ABC_S2        TR_S2_S3
Segment                 Segment
ABC_PQR_S3              S4
TR_XYZ_MNO_S3_XYZ_S2    TR_S5_S7
Year                    Year
St_XYZ_S2               St_S7

Could you please help? Thanks!


Solution

  • Sourav:

    Because you mentioned TRANWRD I will presume the strings values can be found embedded within a states value. The key to effective use of TRANWRD is to TRIM the values when variables are used for the target and replacement arguments.

    Replacement concerns:

    • A single use of TRANWRD will work if there is only one target value embedded.
    • A loop over all targets would be needed if there can be more than one target value embedded.

    There is a possibility that an earlier replacement could make rise a valid replacement that was not previously obvious. Consider the following state value:

    ABC_ABC_MNO_S3
    

    A first loop over all targets would replace ABC_MNO_S3 with S1 and yield

    ABC_S1
    

    A second loop over all targets would replace ABC_S1 with S2 and yield

    S2
    

    Tested sample:

    data have;
    infile cards dlm="," dsd;
    length states segment year $100;
    input states segment year;
    datalines;
    "TR_ABC_MNO_S3_ABC_S2 TR_ABC_S1_ABC_S2", "ABC_PQR_S3 TR_XYZ_MNO_S3_XYZ_S2", "St_XYZ_S2"
    run;
    
    data mappings;
    length string $30 new_string $2;
    input string new_string;
    datalines;
    ABC_MNO_S3  S1
    ABC_S1      S2
    ABC_S2      S3
    ABC_PQR_S3  S4
    XYZ_MNO_S3  S5
    XYZ_S1      S6
    XYZ_S2      S7
    XYZ_PQR_S3  S8
    run;
    
    data want;
      array maps(100,2) $50 _temporary_; * first dimension must be larger than number of mappings;
      do _i_ = 1 by 1 until (lastmap);
        set mappings(rename=(string=_map_from new_string=_map_to)) end=lastmap;
        maps(_i_,1) = _map_from;
        maps(_i_,2) = _map_to;
      end;
    
      length status $12 _result $200;
    
      do until (lastdata);
    
        set have end=lastdata;
        array targets states segment year;
    
    status = 'ORIGINAL'; 
    output;
    
        do _i_ = 1  to dim(targets);
    
          _result = targets[_i_];
          _guard = 1;
          do until (_noreplacement or _guard >= 10);
            _noreplacement = 1;
            do _j_ = 1 to dim(maps,1) while(maps(_j_,1) ne '');
              if index(_result,trim(maps(_j_,1))) then do;
    
    * put _result ': ' maps(_j_,1) '-> ' maps(_j_,2);
    
                _result = tranwrd(_result, trim(maps(_j_,1)), trim(maps(_j_,2)));
                _noreplacement = 0;
              end;
            end;
          end;
    
          if (_guard > 10) then do;
            put 'WARNING: Guard limit 10 reached, mappings may be cycling.' _result;
          end;
    
          targets[_i_] = _result;
        end;
    
        status = 'MAPS APPLIED';
        output;
      end;
    
      stop;
      drop _:;
    run;