Search code examples
luajpegluasocket

Strange bug involving string comparison in Lua


I am trying to create a program that scrapes images from the web in Lua. A minor problem is that images sometimes have no extension or incorrect extensions. See this animated "jpeg" for example: http://i.imgur.com/Imvmy6C.jpg

So I created a function to detect the filetype of an image. It's pretty simple, just compare the first few characters of the returned image. Png files begin with PNG, Gifs with GIF, and JPGs with the strange symbol "╪".

It's a bit hacky since images aren't supposed to be represented as strings, but it worked fine. Except when I actually ran the code.

When I enter the code into the command line it works fine. But when I run a file with the code in it, it doesn't work. Weirder, it only fails on jpegs. It still correctly recognizes PNGs and GIFs.

Here is the minimal code necessary to reproduce the bug:

http = require "socket.http"
function detectImageType(image)
    local imageType = "unknown"
    if string.sub(image, 2, 2) == "╪" then imageType = "jpg" end
    return imageType
end
image = http.request("http://i.imgur.com/T4xRtBh.jpg")
print(detectImageType(image))

Copy and pasting this into the command line returns "jpg" correctly. Running this as a file returns "unknown".

I am using Lua 5.1.4 from the Lua for Windows package, through powershell, on Windows 8.1.

EDIT:

Found the problem string.byte("╪") returns 216 on the command line and 226 when run as a file. I have no idea why, maybe different encodings for lua and powershell?

This line solves the problem:

if string.byte(string.sub(image, 2, 2)) == 216 then imageType = "jpg" end

Solution

  • I think it's because when you're saving your file you're saving it as a different encoding so the ╪ character may be translated to another character. It's more robust to convert it to the byte code:

    http = require "socket.http"
    function detectImageType(image)
        local imageType = "unknown"
        if string.byte(image, 2) == 216 then imageType = "jpg" end
        return imageType
    end
    image = http.request("http://i.imgur.com/T4xRtBh.jpg")
    print(detectImageType(image))