Search code examples
bashawksedcut

Cutting a string using multiple delimiters using the awk or sed commands


I am using a SIPP server simulator to verify incoming calls. What I need to verify is the caller ID and the dialed digits. I've logged this information to a file, which now contains, for example, the following:

From: <sip:972526134661@server>;tag=60=.To: <sip:972526134662@server>}

in each line.

What I want is to modify it to a csv file containing simply the two phone numbers, such as follows:

972526134661,972526134662

and etc.

I've tried using the awk -F command, but then I can only use the sip: as a delimiter or the @ or / as delimiters.

While, basically what I want to do is to take all the strings which begin with a < and end with >, and then take all the strings that follow the sip: delimiter.

using the cut command is also not an option, as I understand that it cannot use strings as delimiters.

I guess it should be really simple but I haven't find quite the right thing to use.. Would appreciate the help, thanks!


Solution

  • OK, for fun, picking some random data (from your original post) and using awk -F as you originally wanted.

    To note, because your file is "generated", we can assume a regular format for the data and not expect the "short" patterns to cause mis-hits.

    [g]awk -F'sip:|@' -v OFS="," '{print $2,$4}' yourlogfile
    

    It uses both sip: and @ as the Field Separator, by means of the alternation operator |. It can easily be extended to allow further characters or strings to also be used to separate fields in the input if required. The built-in variable FS can contain a regular expression/regexp like this.

    For that first sample in your question, it yields this:

    972526134661,972526134662
    

    For the latest (revision 8) version, and guessing what you want:

    [g]awk -F'sip:|@|to_number:' -v OFS="," '{print $2,$5}' yourlogfile
    

    Yields this:

    from_number,972526134662
    

    The [g]awk is because I used gawk on my machine, and got same behaviour with awk.

    Slight amendment in style, suggested by @fedorqui, to use the command-line option -v to set the value for the Output Field Separator (an AWK built-in variable which can be amended using -v like any other variable) and separating the print fields with a comma, so that they are treated in the output as fields, rather than building a string with a hard-coded "," and treating it as one field.