Search code examples
typessas

SAS where to get precision / scale for all "num" columns that store decimal data?


When another team generates output from dictionary_columns and I filter on type = "num": the precision and scale have only 1 value across all columns; "0" and ".", respectively. This is despite the fact that there are dozens if not hundreds of [precision 16, decimal 4] and [precision 16 and decimal 2] as verified by exporting the data. Also, 100% of the type = num columns have [length]= 8.

I've scoured output of dicationary_columns and all manner of SAS support pages. I would expect something like how to either

  • export a data dictionary with complete and accurate data type information
  • export create table statements for all existing tables with complete and accurate data type information

The real answer for DBAs See SAS documentation on the format command; when you supply a format, the value gets stored in the dictionary.columns.format. SAS will respect the format when you execute %ds2csv; so for dbas:

  1. export the dictionary.columns
  2. use dictionary.column to generate create table sql
  3. for (sas)variable type=char use varchar(sas column length)
  4. for (sas)variable type=num use the appropriate integer, decimal(precision, scale) etc. as indicated in the (dictionary.columns.format)

Solution

  • SAS has two type of variables. Fixed length character strings and 64-bit binary floating point numbers.

    The LENGTH of a variable is how many bytes it uses to store the value in the dataset. For a character variable that will match the number of characters when using a single byte encoding. For a numeric variable it is how many of the 8 bytes required to store a 64-bit number you want to store. When the value is retrieved from the dataset the other bytes are set to binary zeros. To see how that impacts the maximum integer that SAS can store precisely you can use the CONSTANT() function with the EXACTINT constant.

    So on a machine using IEEE format for floating point you get:

    1487  data _null_;
    1488    do bytes=3 to 8;
    1489      int=constant('exactint',bytes);
    1490      put bytes int comma23. ;
    1491    end;
    1492  run;
    
    3                   8,192
    4               2,097,152
    5             536,870,912
    6         137,438,953,472
    7      35,184,372,088,832
    8   9,007,199,254,740,992
    

    The values will be different if you are running on an IBM mainframe since that machine uses a different floating point representation.

    You can perhaps impute an estimate of the types of numbers that are being stored based on whether the creator of the dataset decided to attach a specific display format to the variable. The width of the display format is the number of bytes used to display it. So that will include space for all digits before and after the decimal point and also for a decimal point if needed (and if using a format like COMMA or DOLLAR or PERCENT then also space for example other characters displays). The value after the period in the format definition is the number of decimal places to display. So a format of 8.3 will write 4 digits, a decimal point and then 3 more digits for a total of 8 bytes. So using your definition of PRECISION and SCALE you would estimate the SCALE as the number of decimal places (the value of the FORMATD variable produced by PROC CONTENTS) and PRECISION as either the width of the format (FORMATL variable) or the width minus one depending on whether the number of decimal places is zero or not.

    Note that the format is only used to display the values so it does not change how the values are stored.

    If you want to see the PRECISION and SCALE for the possible values of the normal numeric format (which can be referenced using the alias F) then you could run this program and print the resulting dataset.

    data decimal;
      length formatspec $40 format $32 formatl formatd precision scale 8;
      format='F';
      do formatl=1 to 17 ;
        do formatd=0 to max(0,formatl-2);
          formatspec=cats(format,formatl,'.');
          if formatd then formatspec=cats(formatspec,formatd);
          scale=formatd ;
          precision = formatl;
          if formatd then precision=precision-1;
          if precision < 17 then output;
        end;
      end;
    run;
    

    You could do similar calculations for COMMA and DOLLAR formats but including extra subtractions from the PRECISION based on how many extra characters will be printed by those formats. So DOLLAR13.2 (example displayed value $1,234,567.89) would have PRECISION 9 and SCALE 2 since there are 4 punctuation characters counted in the overall width of 13 characters.

    If your values could be negative then when converting from PRECISION and SCALE to a SAS display format you probably need to add one to the display width to account for the negative sign.