I'm looking for a SimpleGrepSedPerlOrPythonOneLiner that outputs all quotations in a text.
Example 1:
echo “HAL,” noted Frank, “said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"HAL,"
"said that everything was going extremely well.”
Example 2:
cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner
stdout:
"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"
etc.
I like this:
perl -ne 'print "$_\n" foreach /"((?>[^"\\]|\\+[^"]|\\(?:\\\\)*")*)"/g;'
It's a little verbose, but it handles escaped quotes and backtracking a lot better than the simplest implementation. What it's saying is:
my $re = qr{
" # Begin it with literal quote
(
(?> # prevent backtracking once the alternation has been
# satisfied. It either agrees or it does not. This expression
# only needs one direction, or we fail out of the branch
[^"\\] # a character that is not a dquote or a backslash
| \\+ # OR if a backslash, then any number of backslashes followed by
[^"] # something that is not a quote
| \\ # OR again a backslash
(?>\\\\)* # followed by any number of *pairs* of backslashes (as units)
" # and a quote
)* # any number of *set* qualifying phrases
) # all batched up together
" # Ended by a literal quote
}x;
If you don't need that much power--say it's only likely to be dialog and not structured quotes, then
/"([^"]*)"/
probably works about as well as anything else.