Search code examples
sasconcatenationretain

SAS Retain not working for 1 string variable


The below code doesn't seem to be working for the variable all_s when there is more than 1 record with the same urn. Var1,2,3 work fine but that one doesn't and I cant figure out why. I am trying to have all_s equal to single_var1,2,3 concatenated with no spaces if it's first.urn but I want it to be

all_s = all_s + ',' + single_var1 + single_var2 + single_var3

when it's not the first instance of that urn.

data dataset_2;
    set dataset_1;
    by URN;
    retain count var1 var2 var3 all_s;
    format var1 $40. var2 $40. var3 $40. all_s $50.;
    if first.urn then do;
        count=0;
        var1 = ' ';
        var2 = ' ';
        var3 = ' ';
        all_s = ' ';
    end;
    var1 = catx(',',var1,single_var1);
    var2 = catx(',',var2,single_var2);
    var3 = catx(',',var3,single_var3);

    all_s = cat(all_s,',',single_var1,single_var2,single_var3);

    count = count+1;
    if first.urn then do;
        all_s = cat(single_var1,single_var2,single_var3);
    end;
run;

Solution

  • all_s is not large enough to contain the concatenation if the total length of the var1-var3 values within the group exceeds $50. Such a scenario seems likely with var1-var3 being $40.

    I recommend using the length function to specify variable lengths. format will create a variable of a certain length as a side effect.

    catx removes blank arguments from the concatenation, so if you want spaces in the concatenation when you have blank single_varN you won't be able to use catx

    A requirement that specifies a concatenation such that non-blank values are stripped and blank values are a single blank will likely have to fall back to the old school trim(left(… approach

    Sample code

    data have;
      length group 8 v1-v3 $5;
      input group (v1-v3) (&);
    datalines;
    1  111  222  333
    1  .    444  555
    1  .    .    666
    1  .    .    .
    1  777  888  999
    2  .    .    .
    2  .    b    c
    2  x    .    z
    run;
    
    data want(keep=group vlist: all_list);
      length group 8 vlist1-vlist3 $40 all_list $50;
      length comma1-comma3 comma $2;
    
      do until (last.group);
        set have;
        by group;
    
        vlist1 = trim(vlist1)||trim(comma1)||trim(left(v1));
        vlist2 = trim(vlist2)||trim(comma2)||trim(left(v2));
        vlist3 = trim(vlist3)||trim(comma3)||trim(left(v3));
    
        comma1 = ifc(missing(v1), ' ,', ',');
        comma2 = ifc(missing(v2), ' ,', ',');
        comma3 = ifc(missing(v3), ' ,', ',');
    
        all_list = 
          trim(all_list)
          || trim(comma)
          || trim(left(v1))
          || ','
          || trim(left(v2))
          || ','
          || trim(left(v3))
        ;
    
        comma = ifc(missing(v3),' ,',',');
      end;
    run;
    

    Reference

    SAS has operators and multiple functions for string concatenation

    • || concatenate
    • cat concatenate
    • catt concatenate, trimming (remove trailing spaces) of each argument
    • cats concatenate, stripping (remove leading and trailing spaces) of each argument
    • catx concatenate, stripping each argument and delimiting
    • catq concatenate with delimiter and quote arguments containing the delimiter

    From SAS 9.2 documentation

    Comparisons

    The results of the CAT, CATS, CATT, and CATX functions are usually equivalent to results that are produced by certain combinations of the concatenation operator (||) and the TRIM and LEFT functions. However, the default length for the CAT, CATS, CATT, and CATX functions is different from the length that is obtained when you use the concatenation operator. For more information, see Length of Returned Variable.

    Note: In the case of variables that have missing values, the concatenation produces different results. See Concatenating Strings That Have Missing Values.