With the help of some plugin, I get a .bib file with information about scientific articles. Sometimes it turns out that the same keys appear in different records.
For example:
@inproceedings{Hosseini_2016,
doi = {10.1109/ism.2016.0028},
url = {https://doi.org/10.1109%2Fism.2016.0028},
year = 2016,
month = {dec},
publisher = {{IEEE}},
author = {Mohammad Hosseini and Viswanathan Swaminathan},
title = {Adaptive 360 {VR} Video Streaming: Divide and Conquer},
booktitle = {2016 {IEEE} International Symposium on Multimedia ({ISM})}
}
@inproceedings{Hosseini_2016,
doi = {10.1109/ism.2016.0093},
url = {https://doi.org/10.1109%2Fism.2016.0093},
year = 2016,
month = {dec},
publisher = {{IEEE}},
author = {Mohammad Hosseini and Viswanathan Swaminathan},
title = {Adaptive 360 {VR} Video Streaming Based on {MPEG}-{DASH} {SRD}},
booktitle = {2016 {IEEE} International Symposium on Multimedia ({ISM})}
I am using pybtex library to parse a file. This library ignores duplicate entries with the same keys. Before using this library, I need to somehow process the file so that all the keys in it are different. How can I do that?
I decided to use regular expressions. There is probably a more convenient solution. I just replace the keys with nanoid.
from nanoid import generate
def process_bibtex(fn):
with open(fn, encoding="utf-8") as r_file:
bibtex = r_file.read()
pattern = r"@([\w\W]+?){([\w\W0-9_\-]+?),"
def callback(matchobj):
return f"@{matchobj.group(1)}{{{generate()},"
with open(fn, "w", encoding="utf-8") as w_file:
w_file.write(re.sub(pattern, callback, bibtex))