I have a number of word documents that I'd like to remove some elements from. What I would like to do is as follows:
\[.*\]
with "" AND replace \(.*\)
with "" Thoughts and direction appreciated. As it stands now, I don't know how to do any of these things programatically. I'm doing this manually as it stands.
If it matters, I'm using Ubuntu 11.04
Since you're open to using plain text, some improvements to your algo:
antiword
to automate conversion from doc to txsed
to do in-place regex modification: sed -i -e's/bad/good/' file.txt
Update (in response to comment):
The regexes are fine, but I didn't understand the objective completely:
if you want to replace occurrences of [foo] & (foo) with "" use:
sed -i -e's/\[.*\]/""/g' file.txt; sed -i -e's/\(.*\)/""/g' file.txt
if you want to replace occurrences [foo] & (foo) with "foo" each use:
sed -i -e's/\[\(.*\)\]/"\1"/g' file.txt; sed -i -e's/(\(.*\))/"\1"/g' file.txt