Given a text file (.tex) which may contain strings of the form "\cite{alice}", "\cite{bob}", and so on, I would like to write a bash script that stores the content within brackets of each such string ("alice" and "bob") in a new text file (say, .txt). In the output file I would like to have one line for each such content, and I would also like to avoid repetitions.
Attempts:
What about:
grep -oP '(?<=\\cite{)[^}]+(?=})' sample.tex | sort -u > cites.txt
-P
with GNU grep
interprets the regexp as a Perl-compatible one (for lookbehind and lookahead groups)-o
"prints only the matched (non-empty) parts of a matching line, with each such part on a separate output line" (see manual)\cite{
(positive lookbehind group (?<=\\cite{)
) and followed by a right curly brace (positive lookafter group (?=})
).sort -u
sorts and remove duplicatesFor more details about lookahead and lookbehind groups, see Regular-Expressions.info dedicated page.