Search code examples
windowsbashfindstr

How do I obtain regex matches of piped command using shell script?


First of all I'm trying to obtain a certain property from a KML file. For now, I tried

ogrinfo C:/test.kml -so -al | findstr "Extent"

which was recommended to me and outputs

Extent: (-100.054053, 33.702234) - (-94.647180, 37.125712)

I would require this in the form

-100.054053,-94.647180,33.702234,37.125712 for which I thought to use regex.

I tried the following just to see what it outputted:

ogrinfo C:/test.kml -so -al | findstr "Extent" | findstr /r /c:"-*[0-9]*\.[0-9]*"

but this still outputs

Extent: (-100.054053, 33.702234) - (-94.647180, 37.125712)

I read somewhere that Windows' FINDSTR only outputs the line where it matched and not the regex matches themselves. Is there some other way of doing it?

If I get that working I would save the matches in different variables somehow in a shell script. I'm no expert in shell scripting but I've been looking around and was thinking of doing something like this

#!/bin/bash

for /f "tokens=*" %%a in ('ogrinfo C:/test.kml -so -al ^| findstr "Extent" ^| findstr /r /c:"-*[0-9]*\.[0-9]*"') do (
  echo %%a
  #do something
)
done >output

but running this causes the shell to immediately disappears and can't even see the error.


Solution

  • Assumptions

    • You have a kml file with raw data.

    • You can extract a single line which starts with "Extent: " to get the values you want

    • Single line => there is only 1 line with that format in the kml file

    • The format of that line is:

      Extent: (NUMBER1, NUMBER2) - (NUMBER3, NUMBER4)
      
    • A number can have the following characters: 0 1 2 3 4 5 6 7 8 9 . -

    • The output you want is:

      NUMBER1,NUMBER3,NUMBER2,NUMBER4
      

    Using Linux tools only, you can do this:

    #!/bin/bash
    #
    datafile="data.kml"
    
    # Ensure the data file exists
    if [[ ! -f "$datafile" ]]
    then
        echo "ERROR: the data file does not exist."
        exit 1
    fi
    
    # Extract the "Extent:" line
    dataline=$(grep "Extent: " "$datafile")
    
    # Make sure the line is of a valid format, and assign the number variables
    if [[ $dataline =~ "Extent: ("([0-9.-]+)", "([0-9.-]+)") - ("([0-9.-]+)", "([0-9.-]+)")" ]] && number1="${BASH_REMATCH[1]}" && number2="${BASH_REMATCH[2]}" && number3="${BASH_REMATCH[3]}" && number4="${BASH_REMATCH[4]}"
    then
        echo "-----DEBUG-----"
        echo "line==$dataline"
        echo "1==$number1"
        echo "2==$number2"
        echo "3==$number3"
        echo "4==$number4"
        echo "-- END DEBUG --"
        echo ""
        echo "$number1,$number3,$number2,$number4"
    else
        echo "ERROR: there is no \"Extent: \" line in the data file ($datafile)"
    fi
    

    Details:

    • Everything is done in the if line.
    • =~ matches the left side with the pattern on the right side.
    • In the regular expression, you can define sections you want to reuse with ( ).
    • Ex: abcd(1)efgh(2)ijkl. The sections you can reuse are 1 and 2.
    • So in the if, each number is surrounded by parentheses.
    • When the =~ is processed, the BASH_REMATCH array is defined with each section.
    • The "DEBUG" echo statements can be removed or commented out.

    If you have more than one "Extent: ..." in the KML file, you can loop on the lines and process each one at a time.