Search code examples
regexnotepad++title-case

Notepad++ and regex - how to title case string between two particular strings?


I have hundreds of bib references in a file, and they have the following syntax:

@article{tabata1999precise,
  title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts. 
Control of solid structure and $\pi$-conjugation length},
  author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
  journal={Macromolecular chemistry and physics},
  volume={200},
  number={2},
  pages={265--282},
  year={1999},
  publisher={Wiley Online Library}
}

I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.

I am able to find all instances using:

(?<=journal\=\{).*?(?=\})

but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.

Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).

So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.

Thank you in advance, I'd appreciate any help.


Solution

  • This works for any number of words:

    • Ctrl+H
    • Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
    • Replace with: \u$1\E$2$3
    • CHECK Wrap around
    • CHECK Regular expression
    • Replace all

    Explanation:

    (?:             # non capture group
        journal={     # literally
      |              # OR
        \G            # restart from last match position
    )               # end group
    \K              # forget all we have seen until this position
    (?:             # non capture group
        (\w{4,})      # group 1, a word with 4 or more characters
      |              # OR
        (\w+)         # group 2, a word of any length
    )               # end group
    (\h*)           # group 3, 0 or more horizontal spaces
    

    Replacement:

    \u          # uppercased the first letter of the following
      $1        # content of group 1
    \E          # stop the uppercased
    $2          # content of group 2
    $3          # content of group 3
    

    Screenshot (before):

    enter image description here

    Screenshot (after):

    enter image description here