Search code examples
gomime-types

Mime type checking of files uploaded Golang


I am trying to get the mime type of files being uploaded in my server.

The .xlsx and .docx files mime type comes up application/zip. I tried to unzip the file and read the file of type "_rels/.rels". The doubt that I have is while reading this particular file, what should the maximum size that I should leave for the reading the file, and if the Target is "xl/workbook.xml" can I assume it to be of type xlsx?

My code is as below

 file, fileHeader, err := r.FormFile("file")

buffer := make([]byte, 512)
_, err = file.Read(buffer)
if err != nil {
    fmt.Println(err)
}

contentType := http.DetectContentType(buffer)
if contentType == "application/zip" {
    r, err := zip.NewReader(file, fileHeader.Size)
    if err != nil {
        fmt.Println(err)
    }
    for _, zf := range r.File {
        if zf.Name == "_rels/.rels" {
            fmt.Println("rels")
            rc, err := zf.Open()
            if err != nil {
                fmt.Println("Rels errors")
            }
            const BufferSize = 1000
            buffer := make([]byte, BufferSize)
            defer rc.Close()
            bytesread, err := rc.Read(buffer)
            if err != nil {
                if err != io.EOF {
                    fmt.Println(err)
                }
            }

            fmt.Println("bytes read: ", bytesread)
            fmt.Println("bytestream to string: ", string(buffer[:bytesread]))
            fmt.Println(rc)
        }
    }
}


var arr []byte
w.Header().Set("Content-Type", "application/json")
w.Write(arr)

}

the output I get is

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships     xmlns="http://schemas.openxmlformats.org/package/2006/relationships"><Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/><Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/><Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="xl/workbook.xml"/></Relationships>

Any tips on how to read a .doc or .xls ?


Solution

  • Unfortunately DetectContentType from the http package is rather limited to the mime types it can detect.

    As for detecting binary formats, you don't need to read the whole file if all you need is to tell if it is a .doc. You can just check the file signature. A good resource for file signatures is file signatures

    If you instead want to use existing packages, this is a summary of what's on github.

    Disclaimer: I'm the author of mimetype.

    • filetype

      • pure go, no c bindings
      • can be extented to detect new mime types
      • has issues with files which pass as more than one mime type (ex: xlsx and docx passing as zip) because it stores matching functions in a map, thus it does not guarantee the order of traversal
    • magicmime

      • needs libmagic-dev installed
      • can be extended, albeit harder... man magic
    • mimetype

      • pure go, no c bindings
      • higher number of detected mime types than filetype
      • is thread safe
      • can be extended