Search code examples
regexstringparsinggnuplot

gnuplot: use regular expressions to parse string


Tell me PLZ how in the gnuplot script you can

1) parse a string and extract a number and a letter/string from it?

2) is it possible to use associative arrays so as not to use multi IF?

files = system(sprintf("dir /b \"%s*.csv\"", inputPath))

do for [name in files]{

    # MY TROUBLES IS HERE
    [value, typeID] = parse(name, "*[%d%s]*"); # pseudocode
    typesList = {"h": 3600, "m": 60, "s": 1};

    scale = value * typesList[typeID];
    # MY TROUBLES IS ABOVE

    myfunc(y) = y * scale

    outputName = substr(name, 0, strlen(name) - strlen(".csv"))

    inputFullPath = inputPath.name
    outputFullPath = outputPath.outputName.outputExt

    plot inputFullPath using 1:(myfunc($2)) with lines ls 1 notitle
}

In my case, I need to get the number of seconds from the file name of the form ...[d=17s]..., ...[d=2m]..., ...[d=15h]... etc

In a more complicated case: ...[d = 2h7m31s]... (this is a general case, it is unlikely to be useful to me, but it would be interesting to know how to resolve it)


Solution

  • gnuplot does not support regular expressions, but you can write a function which extracts the times in seconds from your filename. If your filename and timestamp have a strict format, e.g. like "...[d=2h7m31s]..." you could use the following code. Otherwise you have to adapt it accordingly.

    1. First extract the 2h7m31s part with strstrt()
    2. parse it with strptime()
    3. and make an integer out of it with int()

    Script:

    ### parse special time string
    
    NAME = "Filename[d=2h7m31s].csv"
    
    TimeExtract(s) = int(strptime("%Hh%Mm%Ss",s[strstrt(s,'[d=')+3:strstrt(s,']')-1]))
        
    print TimeExtract(NAME)
    ### end of code
    

    Result:

    7651
    

    Addition:

    the following code also covers other possibilities as long as the sequence is ...[d=..h..m..s]....

    Update: (hopefully the final version)

    The timeformat %H would wrap at 24 hours (actually, here it does at 100 h). So, in order to get the correct time in seconds, the specifier should be %tH, %tM and %tS (check help time_specifiers). With this, you can also parse strange formats like [d=100h100m100s].

    Script:

    ### parse special time string
    reset session
    
    $Data <<EOD
    abcd[d=31s]somethingelse.csv
    efghi[d=7m]somethingelse.csv
    jklmn[d=2h]somethingelse.csv
    op[d=7m31s]somethingelse.csv
    qr[d=2h31s]somethingelse.csv
    uvw[d=2h7m]somethingelse.csv
    xyz[d=2h7m31s]somethingelse.csv
    aaa[d=100h100m100s]strangetime.csv
    EOD
    
    getTimeString(s) = s[strstrt(s,'[d=')+3:strstrt(s,']')-1]
    
    getTimeFormat(s) = \
        (strstrt(getTimeString(s),'h') ? '%tHh' : '').\
        (strstrt(getTimeString(s),'m') ? '%tMm' : '').\
        (strstrt(getTimeString(s),'s') ? '%tSs' : '')
    
    extractTime(s) = int(strptime(getTimeFormat(s),getTimeString(s)))
    
    do for [i=1:|$Data|] {
        s = $Data[i]
        print sprintf("% 12s   %d",getTimeString(s),extractTime(s))
    }
    ### end of script
    

    Result:

             31s   31
              7m   420
              2h   7200
           7m31s   451
           2h31s   7231
            2h7m   7620
         2h7m31s   7651
    100h100m100s   366100