Search code examples
regexloggingsasstring-parsing

Find Dot Separated Words in a String


I need to parse a log file to pick out strings that match the following case-insensitive pattern:

libname.data   <--- Okay
libname.*      <--- Not okay

For those with SAS experience, I'm trying to get SAS dataset names out of a large log.

All strings are space-separated. Some examples of lines:

NOTE: The data set LIBNAME.DATA has 428 observations and 15 variables.
MPRINT(MYMACRO):   data libname.data;
MPRINT(MYMACRO):   create table libname.data(rename=(var1 = var2)) as select distinct var1, var2 as
MPRINT(MYMACRO):   format=date. from libname.data where ^missing(var1) and ^missing(var2) and

What I've tried

This PERL regular expression:

/^(?!.*[.*]{2})[a-z0-9*_:-]+(?:\.[a-z0-9;_:-]+)+$/mi

https://regex101.com/r/jYkXn5/1

In SAS code:

data test;
    line = 'words and stuff libname.data';
    test = prxmatch('/^(?!.*[.*]{2})[a-z0-9*_:-]+(?:\.[a-z0-9;_:-]+)+$/mi', line);
run;

Problem

This will work when the line only contains this exact string, but it will not work if the line contains other strings.

Solution

Thanks, Blindy!

The regex that worked for me to parse SAS datasets from a log is:

/(?!.*[.*]{3})[a-z_]+[a-z0-9_]+(?:\.[a-z0-9_]+)/mi

data test;
    line = 'NOTE: COMPRESSING DATA SET LIBNAME.DATA DECREASED SIZE BY 46.44 PERCENT';

    prxID = prxparse('/(?!.*[.*]{3})[a-z]+[a-z0-9_]+(?:\.[a-z0-9_]+)/mi');
    call prxsubstr(prxID, line, position, length);

    dataset = substr(line, position, length);
run;

This will still pick up some SQL select statements but that is easily solvable through post-processing.


Solution

  • You anchored your expression at the beginning, simply remove the first ^ and you're set.

    /(?!.*[.*]{2})[a-z0-9*_:-]+(?:\.[a-z0-9;_:-]+)+$/mi