Search code examples

Extract cited bibtex keys from tex file using regex in python

I'm trying to extract cited BibTeX keys from a LaTeX document using regex in python.

I'd like to exclude the citation if it is commented out (% in front) but still include it if there is a percent sign (\%) in front.

Here is what I came up with so far:


An example to try it out:

Author et. al \cite{author92} bla bla. % should match
\citep{author93} % should match
\nocite{author94} % should match
100\%\nocite{author95} % should match
100\% \nocite{author95} % should match
%\nocite{author96} % should not match
\cite{author97, author98, author99} % should match
\nocite{*} % should not match

Regex101 testing:

I appreciate any help.


  • Use the newer regex module (pip install regex) with the following expression:


    See a demo on

    More verbose:

    (?<!\\)%.+(*SKIP)(*FAIL)     # % (not preceded by \) 
                                 # and the whole line shall fail
    |                            # or
    \\(?:no)?citep?              # \nocite, \cite or \citep
    \{                           # { literally
        (?P<author>(?!\*)[^{}]+) # must not start with a star
    \}                           # } literally

    If installing another library is not an option, you need to change the expression to


    and need to check programatically if the second capture group has been set (is not empty, that is).
    The latter could be in Python:

    import re
    latex = r"""
    Author et. al \cite{author92} bla bla. % should match
    \citep{author93} % should match
    \nocite{author94} % should match
    100\%\nocite{author95} % should match
    100\% \nocite{author95} % should match
    %\nocite{author96} % should not match
    \cite{author97, author98, author99} % should match
    \nocite{*} % should not match
    rx = re.compile(r'''(?<!\\)%.+|(\\(?:no)?citep?\{((?!\*)[^{}]+)\})''')
    authors = [ for m in rx.finditer(latex) if]

    Which yields

    ['author92', 'author93', 'author94', 'author95', 'author95', 'author97, author98, author99']