Search code examples
regexsasdatastep

SAS Data step free format issue


I would like to include a line feed before the keyword 'data-base-url' only when it doesnt have one.

Input File

    </html>
    <et1>
    <a data-linked-resource-type="userinfo" data-base-url="https://url.com/c">USERNAME 1</a>
    <td class="conTd">
    INFO 1
    </td>
    </et1>

    <et2>
    <a data-linked-resource-type="userinfo" 

    data-base-url="https://url.com/c1">USERNAME 2</a>
    <td class="conTd">
    INFO 2
    </td>
    </et2>

    <et3>
    <a data-linked-resource-type="userinfo" 
    data-base-url=
    "https://url.com/c2">USERNAME 3</a>
    <td class="conTd">
    INFO 3
    </td>
    </et3>
    </html>

    /* data program */
    data inp;
    infile "c:/tmp/output.txt";
    input @'data-base-url=' user_info $30000. 
    @'<td class="conTd">' details $30000.;
    run;
    /* data program ends */

et3 tag is the required pattern. If you run the above program for the input file, you will get only the et3 tag gets converted properly to the user_info and details columns but I would like to include the line feed in the first two tags to get the desired output. Thanks in advance.

Regards, AKS


Solution

  • Here is my solution which is based on your output dataset inp rather than your question per se as with this solution there is no need to modify your input file.

    Basically you read every line of your input file as a single SAS row and manipulate data from there. Modify record length at your convinience.

      data inp;
        infile "/sascr/user/me/output.txt" truncover lrecl=200;
        input string $200. ;
        lstr = lag(string);
        if lstr='<td class="conTd">' then details = string;
        if string='<td class="conTd">' then _info = lstr;
        user_info = scan(lag(_info),-1,'=');
        if length(strip(details))>1 then output;
        keep details user_info;
     run;
    

    Hope this help.