Search code examples
regexbashsedbibtex

help with regex - extracting text


Suppose I have some text files (f1.txt, f2.txt, ...) that looks something like

@article {paper1,
author = {some author},
title = {some {T}itle} ,
journal = {journal},
volume = {16},
number = {4},
publisher = {John Wiley & Sons, Ltd.},
issn = {some number},
url = {some url},
doi = {some number},
pages = {1},
year = {1997},
}

I want to extract the content of title and store it in a bash variable (call it $title), that is, "some {T}itle" in the example. Notice that there may be curly braces in the first set of braces. Also, there might not be white space around "=", and there may be more white spaces before "title".

Thanks so much. I just need a working example of how to extract this and I can extract the other stuff.


Solution

  • Give this a try:

    title=$(sed -n '/^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ {s///; s/}[^}]*$//p}' inputfile)
    

    Explanation:

    • /^[[:blank:]]*title[[:blank:]]*=[[:blank:]]*{/ { - If a line matches this regex
      • s/// - delete the matched portion
      • s/}[^}]*$//p - delete the last closing curly brace and every character that's not a closing curly brace until the end of the line and print
    • } - end if