Search code examples
parsingbeautifulsoupbibtexpyhook

Parse bibtex from url


I need to parse bibtex file in python referred by an url for eg: " https://www.aclweb.org/anthology/papers/J/J18/J18-1001.bib" From bibtex I need to extract "pages" field. How to achieve this in python?


Solution

  • Read it in as a string, then regex to get the string following pages:

    import requests
    import re
    
    url = 'https://www.aclweb.org/anthology/papers/J/J18/J18-1001.bib'
    data = requests.get(url).text
    
    print (re.search(r'(?<=pages = \").*?(?=\",)', data).group())
    

    Output:

    '1--15'