Search code examples
pythonregexbibtex

Parsing BibTeX citation format with Python


What is the best way in python to parse these results? I have tried regex but can't get it to work. I am looking for a dictionary of title, author etc as keys.

@article{perry2000epidemiological,
  title={An epidemiological study to establish the prevalence of urinary symptoms and felt need in the community: the Leicestershire MRC Incontinence Study},
  author={Perry, Sarah and Shaw, Christine and Assassa, Philip and Dallosso, Helen and Williams, Kate and Brittain, Katherine R and Mensah, Fiona and Smith, Nigel and Clarke, Michael and Jagger, Carol and others},
  journal={Journal of public health},
  volume={22},
  number={3},
  pages={427--434},
  year={2000},
  publisher={Oxford University Press}
}

Solution

  • You might be looking for a BibTeX-parser: https://bibtexparser.readthedocs.io/en/master/

    Source: https://bibtexparser.readthedocs.io/en/master/tutorial.html#step-0-vocabulary

    Input/Create bibtex file:

    bibtex = """@ARTICLE{Cesar2013,
      author = {Jean César},
      title = {An amazing title},
      year = {2013},
      month = jan,
      volume = {12},
      pages = {12--23},
      journal = {Nice Journal},
      abstract = {This is an abstract. This line should be long enough to test
         multilines...},
      comments = {A comment},
      keywords = {keyword1, keyword2}
    }
    """
    
    with open('bibtex.bib', 'w') as bibfile:
        bibfile.write(bibtex)
    

    Parse it:

    import bibtexparser
    
    with open('bibtex.bib') as bibtex_file:
        bib_database = bibtexparser.load(bibtex_file)
    
    print(bib_database.entries)
    

    Output:

    [{'journal': 'Nice Journal',
      'comments': 'A comment',
      'pages': '12--23',
      'month': 'jan',
      'abstract': 'This is an abstract. This line should be long enough to test\nmultilines...',
      'title': 'An amazing title',
      'year': '2013',
      'volume': '12',
      'ID': 'Cesar2013',
      'author': 'Jean César',
      'keyword': 'keyword1, keyword2',
      'ENTRYTYPE': 'article'}]