I have a huge file containing 4.1 million records and need to find these - Clock Accuracy – SM111.ppt
kind of files which have unreadable characters. Another such exampole is - 241395 - Ansprüche.doc
How to match this using regular expression. I am using oracle 12c database
This looks a lot like a problem with the character encoding of your file. The file appears to be UTF-8-encoded: ü
stands for ü
, which makes Ansprüche.doc
make sense. –
encodes the N-dash (–
) and so on.
So you need to open the file using UTF-8 as its encoding, then the correct characters should appear (unless the file is corrupted by using several encodings at once).