Search code examples
pythonsasretain

In SAS, what is the use of RETAIN when setting 2 datasets, and the equivalent in Python?


I am trying to re-code some SAS code into Python. I have the below SAS code:

DATA DF_FINAL;
    RETAIN UEN UEN_NO FEE;  
    SET DF_ADJ1 DF_ADJ2;
    KEEP UEN UEN_NO FEE;
RUN;

I don't understand what RETAIN is needed for and I need the equivalent in Python. I tried running the code without the RETAIN line but get the same output. Please assist.

Thank you


Solution

  • The real purpose of a RETAIN statement is to indicate that the values of a NEW variable that is being calculated in the data step should NOT have its values reset to missing when the data step starts processing the next observation.

    In this step the RETAIN's formal purpose has no effect. That is because the data step is not calculating any new variables. The only source of values for variables are the input datasets. And variables sourced from input datasets are already "retained".

    So the RETAIN statement's only purpose in that data step is to make sure that UEN and UEN_NO are the first two variables in the datasets. So when you print or look the data those two will appear in columns 1 and 2.

    The reason it works is because SAS creates the list of variables in the data step in the order it first sees them.

    The reason people use RETAIN instead of some other statement to get this side effect of setting the variable order is that unlike references to variable names in other statements (like an assignment statement) SAS does not force a TYPE on the variable when it sees it in the RETAIN statement. So the type and storage length will be determined by how those variables are defined in the source dataset(s).