Search code examples
shellawksedgrepcygwin

Grep key-value pairs from a badly formatted plaintext file


I am writing a shell script which needs to retrieve key-value pairs from badly formatted plaintext .txt files. The .txts are MS Word documents which have been saved as plaintext. As you can see from the sample below Sample_Profile.txt, keys are succeeded by values which have been delimited by opening and closing parentheses.

User First Name

(Goofball)

User Last Name

(Goofberg) Email Address

([email protected])

Password (sogoofedrightnow)

1. Profile details

Profile name*  (Goofball's Profile) Profile Id**
(Guid2763944-a234)

The only problem seems to be ignoring white-space and empty lines when matching a key to its value. In summary, what I would like to do is specify the key (e.g. "User First Name" or "Profile Name") and grep only the corresponding value, and finally pipe to my sed so I get the values I need.

Here is the script I have written which is meant to get the value for "User First Name".

FIRST_NAME=$(grep "User First Name" Sample_Profile.txt | sed 's|[^(]*(\([^)]*\)).*|\1|') 
#grep User First Name key and pipe to sed to get the value bewteen parentheses
sed -i -e 's/USER_FIRST_NAME/'"$FIRST_NAME"'/g' UserName.txt 
echo $FIRST_NAME 
# outputs "User First Name" when it should get "Goofball" (grep is not
# piping correctly due to white space)

Solution

  • awk '/User First Name/ {print $2}' RS=')' FS='('
    

    Output:

    Goofball