Search code examples
bashunixawksed

UNIX Bash - Removing double quotes from specific strings within a file


I'm attempting to edit a file to remove double quotation marks that are wrapped around multiple strings of varied lengths. Some of these strings include capitalisation and white-space, normally I would just use a global search and replace, however, some of the strings CAN'T have the double quotes removed as they're required.

An extract of the file in question is here:

"tplan"."external_plan_ref" "Plan ID",
            'CMP' CMP,
            "bd"."NAME" "Business Divison",
            "reg"."NAME" "Region",
            placeholder1 "Placeholder 1",
            "ct"."COUNTRY_NAME" "COUNTRY",
            city "City",
            placeholder2 "Placeholder 2",
            placeholder3 "Placeholder 3",
            placeholder4 "Placeholder 4",

The wrapped string after the . are the strings which require the double quotes removed. Ex:

."NAME"

I've attempted to use awk and sed with a regex to identify what needs replacing and then for it to be replaced, but I've had no luck and have struggled to wrap my head around it. Any advice or recommendations would be truly appreciated. Thank you!

Sample Output:

 "tplan".external_plan_ref "Plan ID",
            'CMP' CMP,
            "bd".NAME "Business Divison",
            "reg".NAME "Region",
            placeholder1 "Placeholder 1",
            "ct".COUNTRY_NAME "COUNTRY",
            city "City",
            placeholder2 "Placeholder 2",
            placeholder3 "Placeholder 3",
            placeholder4 "Placeholder 4",

Solution

  • This might work for you (GNU sed):

    sed 's/\."\([^"]*\)"/.\1/g' file
    

    Match on a period, followed by a double quoted string(s) and replace by a period and the string less the double quotes.

    N.B. The period needs to be quoted/escaped otherwise it matches any character.