Search code examples
phplaravelamazon-s3ascii

Laravel: can't read ascii file correctly (special chars)


I'm having trouble reading a .txt file (ascii) from my S3 Bucket. When I read it the special characters get displayed as "?". For example when the file is containing the word "Äpfel" it will get read as "?pfel". How can i fix that?

My code:

$contents = Storage::disk('s3')->get('test123.txt');

Solution

  • The root of the trouble is that the encoding of the text file differs from the expectations of what is displaying it. Converting from "ASCII" is not particularly productive in most cases as that really only covers the basic English alphabet and punctuation, and anything else is simply discarded, as you've seen.

    In order to properly convert the file you need to know the source encoding. It's important to note that while there's no shortage of functions that purport to "detect" encodings, the reality is that they are guessing. Aside from a very narrow range of cases it is simply impossible to definitively know what encoding a given string uses. The encoding is metadata that needs to be known alongside the data itself.

    Anyhow, an educated guess based on the human language of the string [German?] and the way that the character was replaced with a single placeholder would imply that the source encoding is either ISO-8859-1 or cp1252. Differentiating between these two encodings is an additional level of educated guessing, but there's not enough information to make that call. If you convert from 8859-1 and you're still missing symbols like €, then it's probably 1252.

    All that said, you'll likely want to run something like:

    $contents = mb_convert_encoding($contents, 'UTF-8', 'ISO-8859-1');