I am writing a shell script which needs to retrieve key-value pairs from badly
formatted plaintext .txt
files. The .txt
s are MS Word documents which have
been saved as plaintext. As you can see from the sample below
Sample_Profile.txt
, keys are succeeded by values which have been delimited by
opening and closing parentheses.
User First Name (Goofball) User Last Name (Goofberg) Email Address ([email protected]) Password (sogoofedrightnow) 1. Profile details Profile name* (Goofball's Profile) Profile Id** (Guid2763944-a234)
The only problem seems to be ignoring white-space and empty lines when matching
a key to its value. In summary, what I would like to do is specify the key (e.g.
"User First Name" or "Profile Name") and grep
only the corresponding value,
and finally pipe to my sed
so I get the values I need.
Here is the script I have written which is meant to get the value for "User First Name".
FIRST_NAME=$(grep "User First Name" Sample_Profile.txt | sed 's|[^(]*(\([^)]*\)).*|\1|')
#grep User First Name key and pipe to sed to get the value bewteen parentheses
sed -i -e 's/USER_FIRST_NAME/'"$FIRST_NAME"'/g' UserName.txt
echo $FIRST_NAME
# outputs "User First Name" when it should get "Goofball" (grep is not
# piping correctly due to white space)
awk '/User First Name/ {print $2}' RS=')' FS='('
Output:
Goofball