Search code examples
sas

SAS if then recode and change format issues


I need to change my data from being read at character to numeric. Some of my data shows as NP instead of numbers where the data wasn't collected. I wanted to create a new variable where that is blank and then change the data to numeric (I could use advice on which code is best since some data goes out fairly far is written as E-2). My problem is that my code to change the NP to a blank seems to make all of the lines blank not just the ones that said MP. How should I correct this?

  data mRNAMerged2;
    set mRNAMerged;
    if GFPT1= "NP" then GFPT1_2 = " ";
    if GFPT2= "NP" then GFPT2_2 = " ";
    if GNPNAT1 = "NP" then GFPT1_2 = " ";
    if MGAT1 = "NP" then MGAT1_2 = " ";
    if NAGK = "NP" then NAGK_2 = " ";
    if OGA = "NP" then OGA_2 = " ";
    if OGT = "NP" then OGT_2 = " ";
    if PGM3 = "NP" then PGM3_2 = " ";
    if UAP1 = "NP" then UAP1_2 = " ";
  run;

Proc Contents Dataset before analysis- working with OGA OGT PGM3 Dataset after running my code


Solution

  • Since the only value you ever assign to GFPT1_2 is " " there is no way it could ever have any other value. I suspect what you wanted to do was:

    if GFPT1 ne "NP" then GFPT1_2 = input(GFPT1,32.);
    

    How did you create this dataset? Did you read in a TEXT file, like a CSV file? If so then it would be better to read it correctly the first time instead of trying to convert the values later. You could use a custum INFORMAT that converts NP to missing, or perhaps better some special missing value, such as .N (for Not Present).

    proc format ;
      invalue np 'NP'=.n other=[32.] ;
    run;
    data want;
       infile 'myfile.csv' dsd truncover firstobs=2;
       length Numer_of_Patient_per_Sample OGA OGT ... PGM3 8 ;
       informat _numeric_ np.;
       input Number_of_Paitent_per_sample -- pgm3;
    run;