Search code examples
stringsasrecode

recode values in a character variable based on another character variable's value in sas


jrnlfile is a dataset with journal names and identifiers. Here are the first 6 obs:

id      journal                         issn
56201   ACTA HAEMATOLOGICA              0001-5792
94365   ACTA PHARMACOLOGICA SINICA  
10334   ACTA PHARMACOLOGICA SINICA      1671-4083
55123   ADVANCES IN ENZYME REGULATION   0065-2571
90002   AGING   
10403   AGING                           1945-4589

Compare id 94365 and 10334. These obs name the same journal. They need the same issn. An obs with a missing value for issn almost always has at least one partner obs that contains a matching journal name and the correct issn. Wherever this is true, I want to recode the missing issn so it contains the issn seen in other instances where the same journal is mentioned. A revised dataset want would look like this:

id      journal                         issn
56201   ACTA HAEMATOLOGICA              0001-5792
94365   ACTA PHARMACOLOGICA SINICA      1671-4083
10334   ACTA PHARMACOLOGICA SINICA      1671-4083
55123   ADVANCES IN ENZYME REGULATION   0065-2571
90002   AGING                           1945-4589
10403   AGING                           1945-4589

I currently use if-else statements in a data step to populate missing issn values with matching entries for journal:

data want;
    set jrnlfile;
         if journal = "ACTA PHARMACOLOGICA SINICA" then issn = "1671-4083";
    else if journal = "AGING"                      then issn = "1945-4589";
    /*continue for 7,000 other journals*/
    run;

But jrnlfile contains 50,000 obs and 7,000 unique journals, so this takes a lot of time and is rather error-prone. This answer gets me halfway there, but issn is not numeric and I can't solve the problem by simply adding values to it.

What is a more efficient and systematic way to get to want from jrnlfile?


Solution

  • You can use retain statment. But there is restraints for this code. To empty journal will be set the first found issn. And there are must be a one or more issn for journal group.

    proc sort data=JRNLFILE;
        by journal descending issn;
    run;
    
    data want;
        set  JRNLFILE;
        retain t_issn;
        by journal descending issn;
    
        if first.journal then
            do;
                if issn="" then do;
                    put "ERROR: there is no issn val for group";
                    stop;
                end;
                else t_issn =issn;
            end;
    
        if issn="" then
            do;
                issn=t_issn;
            end;
    run;
    

    For example. If you use this table:

    +-------+------------------------------+-----------+
    |  id   |           journal            |    issn   |
    +-------+------------------------------+-----------+
    | 94365 | ACTA PHARMACOLOGICA SINICA   |           |
    | 10334 | ACTA PHARMACOLOGICA SINICA   | 1671-4083 |
    |     1 | ACTA PHARMACOLOGICA SINICA   | A_TEST    |
    |     2 | ACTA PHARMACOLOGICA SINICA   | WAS       |
    |     3 | ACTA PHARMACOLOGICA SINICA   | SATRTED   |
    +-------+------------------------------+-----------+
    

    You will get:

    +-------+----------------------------+-----------+--------+
    |  id   |          journal           |   issn    | t_issn |
    +-------+----------------------------+-----------+--------+
    |     2 | ACTA PHARMACOLOGICA SINICA | WAS       | WAS    |
    |     3 | ACTA PHARMACOLOGICA SINICA | SATRTED   | WAS    |
    |     1 | ACTA PHARMACOLOGICA SINICA | A_TEST    | WAS    |
    | 10334 | ACTA PHARMACOLOGICA SINICA | 1671-4083 | WAS    |
    | 94365 | ACTA PHARMACOLOGICA SINICA | WAS       | WAS    |
    +-------+----------------------------+-----------+--------+
    

    Error example. If you use this table:

    +-------+------------------------------+-----------+
    |  id   |           journal            |    issn   |
    +-------+------------------------------+-----------+
    | 56201 | ACTA HAEMATOLOGICA           | 0001-5792 |
    | 94365 | ACTA PHARMACOLOGICA SINICA   |           |
    +-------+------------------------------+-----------+
    

    You will get an ERROR:

    ERROR: there is no issn val for group

    *t_issn leaved to understand function :))