Search code examples
perlinputfile-encodings

Perl and reading files with different encodings


I am using a perl script to read in a file, but I'm not sure what encoding the file is in. Basically, my file is a list of book titles, but each book has other info associated with it (author, publication date, etc). So each book title is within a discrete chunk of data for the book. So I iterate through the file line by line until I find the regular expression '/Book Title: (.*)/' and take what's in the paren. Then, I create a separate .txt file with the name of the text file being my book. However, in my unix server, when I look at the name of the file, it's actually not, for example, 'LordOfTheFlies.txt' but rather 'LordOfTheFlies^M.txt'

What is this '^M'? Is that a weird end of line encoding I'm not taking into account? I tried chomp but it doesn't seem to be working. What is the best file encoding for working with perl?


Solution

  • It's the additional carriage return character that Windows systems insert before line feed characters (M == 13th letter, hence ASCII 13 is visualised as ^M).

    It has nothing to do with file encoding, it's just the line ending policy biting you. Perl is usually good at handling line ending characters correctly, but if they occur somewhere else than the end of a line you have to do it yourself. You can use s/\r// instead of chomp() to get them out.