I am completely new to regex. I just want to know whether this is possible.(I'm sorry if the explanation is confusing or too complicated) Say, I just want to find and replace this particular heading in bold:
"As discussed in chapter 1, the users of financial statements can be categorised as resource provider. (space)(space)Users and decision making(space)(space) An example for this. (space)(space)Nature and purpose of financial analysis(space)(space) We have identi fied that financial analysis mvolves expressing reported numbers in financial statements in relative terms. "
to this :
"As discussed in chapter 1, the users of financial statements can be categorised as resource provider.
(new line) Users and decision making (tab space) An example for this.
(new line) Nature and purpose of financial analysis (tab space) We have identi fied that financial analysis mvolves expressing reported numbers in financial statements in relative terms. "
As my knowledge of regex is currently limited, I try to break it down to 2 parts :
1. to find ". (space)(space)Nature" :
[(.)]\s\s[(A-Z)]\w+
to \n$&
2. to find "analysis(space)(space) We" :
[(a-z)]\w+\s\s[(A-Z)]
to ??
So, my question is that is it possible to just define 1 regex for
. (space)(space)Users and decision making(space)(space) An
. (space)(space)Nature and purpose of financial analysis(space)(space) We
and replace it with the example above?
Thank you!
PS. The reason behind this weird editing is to upload this to anki flashcard software as txt without further editing.
My current method can be quite taxing if I were to edit the whole text from a thick textbooks (which can contain more than 1000 editing per chapter x20 or so chapters x5 textbooks and more).
fyi, in anki and several other flashcard softwares, tab is the field separator between the front/question and the back/answer.
The double space[ ][ ] is used to separate specific heading from the single space when using find and replace; which has been pre-set by myself beforehand.
The new line (\n) is for adding new separate flashcards.
Anki (and several other flashcard softwares) supports html so I usually added multiple cards by copying the text from pdf using notepad++ and regex find and replace several heading or first word of a sentence to suit with the question/front part of anki flashcard while the rest becomes the answer part; and then import it to anki. If it is possible to automate all the finding part, I can save a helluva lot of time!
After googling and tinkering for a while, I think I finally find the answer! :D
[ ]{2,}([A-Z])[\w ]{1,}[ ]{2,}
replace with
\n$&\t\t
Drawing inspiration from :
Regex for multiple words split by spaces
Python regex: Including whitespace inside character range
http://www.rexegg.com/regex-quickstart.html
and @Jan's answer