Search code examples
pythonbibtexpybtex

How to get readable unicode string from single bibtex entry field in python script


Suppose you have a .bib file containing bibtex-formatted entries. I want to extract the "title" field from an entry, and then format it to a readable unicode string.

For example, if the entry was:

@article{mypaper,
    author = {myself},
    title = {A very nice {title} with annoying {symbols} like {\^{a}}}
}

what I want to extract is the string:

A very nice title with annoying symbols like â

I am currently trying to use the pybtex package, but I cannot figure out how to do it. The command-line utility pybtex-format does a good job in converting full .bib files, but I need to do this inside a script and for single title entries.


Solution

  • Figured it out:

    def load_bib(filename):
        from pybtex.database.input.bibtex import Parser
        parser = Parser()
        DB = parser.parse_file(filename)
        return DB
    
    def get_title(entry):
        from pybtex.plugin import find_plugin
        style = find_plugin('pybtex.style.formatting', 'plain')()
        backend = find_plugin('pybtex.backends', 'plaintext')()
        sentence = style.format_title(entry, 'title')
        data = {'entry': entry,
                'style': style,
                'bib_data': None}
        T = sentence.f(sentence.children, data)
        title = T.render(backend)
        return title
    
    DB = load_bib("bibliography.bib")
    print ( get_title(DB.entries["entry_label"]) )
    

    where entry_label must match the label you use in latex to cite the bibliography entry.