Search code examples
sasinformat

How to read informats data: $1,000.1M to 1000.1


The datasets include a list of numbers:

    $1,000.1M
      $100.5M
    $1,002.3M
     $23.4M
     $120.3M

I want to read the variable as a numeric in SAS

the result should be:

   Money(millions)
   1000.1
   100.5
   1002.3
   23.4
   120.3

I used COMMAw.d to read this data, but cannot run

The code is:

    input Money(millions) COMMA9.1;
    run;

How to modify it?

Thank you very much!


Solution

  • The COMMA informat does not expect letters like 'M', it removes only commas, blanks, dollar signs, percent signs, dashes, and close parentheses. You can just convert your raw string to a string containing a number by removing all characters you do not need:

    data input;
        length moneyRaw $200;    
        infile datalines;
        input moneyRaw $;
    
        datalines;
    $1,000.1M
    $100.5M
    $1,002.3M
    $23.4M
    $120.3M
    ;
    run;
    
    data result;
        set input;
       * "k" modifier inverts the removed characters;
        money = input(compress(moneyRaw,"0123456789.","k"),best.);
    run;
    

    Or if you know regex, you can add some intrigue to the code for anyone who reads it in the future:

    data resultPrx;
        set input;
        moneyUpdated = prxChange("s/^\$(\d+(,\d+)*(\.\d+)?)M$/$1/",1,strip(moneyRaw));
        money = input(compress(moneyUpdated,','),best.);
    run;