Search code examples
.netregexcharacter-encodingspoken-language

Regular Expression with foreign languages


I have a function that I have used a bunch of times in various files which has a signature like:

Translate("English Message", "Spanish Message", "French Message")

and I am wanting to pull out the English, Spanish and French messages and then output them into a csv so that people who actually know these languages can tell me what I SHOULD have put in there.

Anyway, what I am running into is that some French and Spanish messages don't show up because of the accented characters and single quotes.

This is a vb.net program.

Edit

There was no problem with the language, my issue was actually the regular expression and my complete lack of understanding regular expressions.


Solution

  • Depends on the regex library you are using. Sane regex implementations use UTF-8 and have no such problems, but more details would be helpful about what lang you are using, what regex library etc.