Search code examples
bashawksednon-greedy

Use sed (or similar) to remove anything between repeating patterns


I'm essentially trying to "tidy" a lot of data in a CSV. I don't need any of the information that's in "quotes".

Tried sed 's/".*"/""/' but it removes the commas if there's more than one section together.

I would like to get from this:

1,2,"a",4,"b","c",5

To this:

1,2,,4,,,5

Is there a sed wizard who can help? :)


Solution

  • You may use

    sed 's/"[^"]*"//g' file > newfile
    

    See online sed demo:

    s='1,2,"a",4,"b","c",5'
    sed 's/"[^"]*"//g' <<< "$s"
    # => 1,2,,4,,,5
    

    Details

    The "[^"]*" pattern matches ", then 0 or more characters other than ", and then ". The matches are removed since RHS is empty. g flag makes it match all occurrences on each line.