Search code examples
bashcitations

Bash: Converting Pandoc style Markdown citations to LaTeX style citations


I write in markdown, however I need to convert my files to PDF via LaTeX.

I'm struggling with different cases or if statements.

  1. I'd like to convert [@citekey] citations to \footcite{citekey}.

I've managed to that with sed 's/\[@\([^]]*\)]/\\footcite{\1}/g' "$input_file" > "$input_file_temp.md"

However, it gets more complicated because there are a few possible cases.

  1. If there are page numbers, which the markdown style citation indicate like this [@citekey, 123], this needs to be prepended like this \footcite[123]{citekey}.

That works with sed 's/\[@\([^,]*\), \([0-9]\+\)\]/\\footcite[\2]{\1}/g' "$input_file" > "$input_file_temp.md"

  1. Finally, two related cases, that I can't figure out, if there are two or more citekeys, it needs to create another command instead, so e.g. [@citekey1; @citekey2] gets turned into \footnote{\cite[]{citekey1}, \cite[]{citekey2}}

  2. Notice the empty square brackets because I imagine that this can be then used for the fourth case, where there are two or more citekeys, and at least one has page numbers, e.g. [@citekey1, 42; @citekey2, 108; @citekey3] should be turned into \footnote{\cite[42]{citekey1}, \cite[108]{citekey2}, \cite[]{citekey3}}

I have an attempted script, but it doesn't work, even in the first two cases where the sed on its own works. I imagine the if statements are wrong.

#!/bin/bash

input_file=$1

# Check if there is a comma, i.e. a citekey and a page number

if [ $(grep -c '\[@[^,]*, [0-9][0-9]*\]' "$input_file") -ge 1 ]; then
    # Replace patterns with page numbers
    
    sed 's/\[@\([^,]*\), \([0-9]\+\)\]/\\footcite[\2]{\1}/g' "$input_file" > "$input_file_temp.md"

# Check if there is only ONE citekey
elif [ $(grep -c '\[@[^;]*;[^]]*\]' "$input_file") -eq 0 ]; then
    # Replace patterns and save to a new file
    sed 's/\[@\([^]]*\)]/\\footcite{\1}/g' "$input_file" > "$input_file_temp.md"

# Multiple citekeys, NO pages
elif [ $(grep -o '@' "$input_file" | wc -l)  -ge 2 ]; then


    sed 's/\[@\([^]]*\)\]/\\footcite{\1}/g' "$input_file" > "$input_file_temp.md"


fi

I'm open to suggestions if there are easier or more elegant ways to do this.

Thanks!


Solution

  • This, using GNU awk for mult-char RS, RT, gensub() and \s, might be what you want:

    $ cat tst.sh
    #!/usr/bin/env bash
    
    awk -v RS='[[]@[^]]+]' -v ORS= '
        RT {
            ckpnsStr = gensub(/^\[|\s*]$/,"","g",RT)
            numCkpns = split(ckpnsStr,ckpnsArr,/\s*;\s*/)
            RT = "\\foot"
            for ( ckpnNr=1; ckpnNr <= numCkpns; ckpnNr++ ) {
                split(ckpnsArr[ckpnNr],ck_pn,/\s*,\s*/)
                ck = gensub(/@/,"",1,ck_pn[1])
                pn = ck_pn[2]
                if ( numCkpns == 1 ) {
                    pn = ( pn == "" ? "" : "[" pn "]" )
                    RT = RT "cite" pn "{" ck
                }
                else {
                    RT = RT ( ckpnNr == 1 ? "note{" : ", " )
                    RT = RT "\\cite[" pn "]{" ck "}"
                }
            }
            RT = RT "}"
        }
        { print $0 RT }
    ' "${@:--}"
    

    but you didn't provide any sample input/output for us to test a potential solution against so here it is run against the text from your question since it at least contains examples of the replacements you want to make and you can decide if this is the expected output or not:

    $ ./tst.sh file
    I'm struggling with different cases or if statements.
    
        I'd like to convert \footcite{citekey} citations to \footcite{citekey}.
    
    I've managed to that with sed 's/\\footcite{\([^}]*\)]/\\footcite{\1}/g' "$input_file" > "$input_file_temp.md"
    
    However, it gets more complicated because there are a few possible cases.
    
        If there are page numbers, which the markdown style citation indicate like this \footcite[123]{citekey}, this needs to be prepended like this \footcite[123]{citekey}.
    
    That works with sed 's/\\footcite[]{\([^}*\), \([0-9]\+\)\]/\\footcite[\2]{\1}/g' "$input_file" > "$input_file_temp.md"
    
        Finally, two related cases, that I can't figure out, if there are two or more citekeys, it needs to create another command instead, so e.g. \footnote{\cite[]{citekey1}, \cite[]{citekey2}} gets turned into \footnote{\cite[]{citekey1}, \cite[]{citekey2}}
    
        Notice the empty square brackets because I imagine that this can be then used for the fourth case, where there are two or more citekeys, and at least one has page numbers, e.g. \footnote{\cite[42]{citekey1}, \cite[108]{citekey2}, \cite[]{citekey3}} should be turned into \footnote{\cite[42]{citekey1}, \cite[108]{citekey2}, \cite[]{citekey3}}
    
    I have an attempted script, but it doesn't work, even in the first two cases where the sed on its own works. I imagine the if statements are wrong.