I have one file suffix.txt
which contains some strings linewise, for example-
ing
ness
es
ed
tion
Also, I have a text file text.txt
which contains some text,
it is given that text.txt
consists only of lowercase letters and without any punctuation, for example-
the raining cloud answered the man all his interrogation and with all
questioned mind the princess responded
harness all goodness without getting irritated
I want to remove the suffixes from the original words in text.txt
only once for every suffix. Thus I expect the following output-
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat
Note that tion
was not removed from questioned
since the original word didn't contain tion
as a suffix. It would be really helpful if someone could answer this with sed
commands.
I was using a naive script that doesn't seem to do the job-
#!/bin/bash
while read p; do
sed -i "s/$p / /g" text.txt;
sed -i "s/$p$//g" text.txt;
done <suffix.txt
An awk:
$ awk '
NR==FNR { # generate a regex of suffices
s=s (s==""?"(":"|") $0 # (ing|ness|es|ed|tion)$
next
}
FNR==1 {
s=s ")$" # well, above )$ is inserted here
}
{
for(i=1;i<=NF;i++) # iterate all the words and
sub(s,"",$i) # apply regex to each of them
}1' suffix text # output
Output:
the rain cloud answer the man all his interroga and with all
question mind the princess respond
har all good without gett irritat