Search code examples
pythontextnlpformattingtext-processing

Convert text written in some syntax to another specified syntax


I want to convert a play written in the Markdown extension Fountain to LaTeX (more specific my own LaTeX template for plays). For this I need to convert text which is given in the following format

Some stage directions.

CHARACTER A:
Text the character is saying.

CHARACTER B:
Text the other character is saying.

Some other stage direction.

CHARACTER B:
Some more text the other character is saying.

to

\textit{Some stage directions.}

\dialog{Character A}{Text the character is saying.}
\dialog{Character B}{Text the other character is saying.}

\textit{Some other stage direction.}

\dialog{Character B}{Some more text the other character is saying.}

I would like to avoid writing such a program from scratch. Is there a tool or package (for e.g. Python) which allows to do this rather basic reformatting? Problematic could be, that the stage directions are not uniformly distributed in the text, i,e. after a character said something there might or might not be a stage direction.


Solution

  • Assuming the blocks are separated by a double newline, this is easily achievable using a regex:

    Input:

    t='''Some stage directions.
    
    CHARACTER A:
    Text the character is saying.
    
    CHARACTER B:
    Text the other character is saying.
    
    Some other stage direction.
    
    CHARACTER B:
    Some more text the other character is saying.'''
    

    Code :

    import re
    out = '\n\n'.join(fr'\dialog{{{m.group(1)}}}{{{m.group(2)}}}'
                      if (m:=re.match('([^\n]+):\n(.*)', s))
                      else fr'\textit{{{s}}}'
                      for s in re.split('\n\n', t))
    
    print(out)
    

    Output:

    \textit{Some stage directions.}
    
    \dialog{CHARACTER A}{Text the character is saying.}
    
    \dialog{CHARACTER B}{Text the other character is saying.}
    
    \textit{Some other stage direction.}
    
    \dialog{CHARACTER B}{Some more text the other character is saying.}