I work with Notepad++ and Excel. I have data that contains text in English and Chinese.
The data structure is as follows:
<p> chinese text</p>
<p> english text</p>
<p> chinese text</p>
<p> english text</p>
<p> chinese text</p>
<p> english text</p>
How to delete all English text and also symbols between < p> and < /p> ?
So just leave the Chinese text between < p> and < /p>
So the result is like this:
<p> chinese text</p>
<p> chinese text</p>
<p> chinese text</p>
I tried to delete English text by removing ascii characters using regex, but there is an English text that was missed.
You should be able to do this using Notepad++:
<p>[a-zA-Z"].*$
to empty string (regex replace mode)\n\n
to \n
(extended replace mode)<p>|</p>
to empty string (regex replace mode)