Search code examples
gozip

Editing ZIP archive in place in Golang


I'm writing an application that allows a user to upload anonymized data to an S3 bucket in order to allow them to try out our product without providing us with authentication data.

This is the struct that handles ZIP archives, which have already proven to be correct:

type ZipWriter struct {
    buffer *bytes.Buffer
    writer *zip.Writer
}

func FromFile(file io.Reader) (*ZipWriter, error) {

    // First, read all the data from the file; if this fails then return an error
    data, err := ioutil.ReadAll(file)
    if err != nil {
        return nil, fmt.Errorf("Failed to read data from the ZIP archive")
    }

    // Next, put all the data into a buffer and then create a ZIP writer
    // from the buffer and return that writer
    buffer := bytes.NewBuffer(data)
    return &ZipWriter{
        buffer: buffer,
        writer: zip.NewWriter(buffer),
    }, nil
}

// WriteToStream writes the contents of the ZIP archive to the provided stream
func (writer *ZipWriter) WriteToStream(file io.Writer) error {

    // First, attempt to close the ZIP archive writer so that we can avoid
    // double writes to the underlying buffer; if an error occurs then return it
    if err := writer.writer.Close(); err != nil {
        return fmt.Errorf("Failed to close ZIP archive, error: %v", err)
    }

    // Next, write the underlying buffer to the provided stream; if this fails
    // then return an error
    if _, err := writer.buffer.WriteTo(file); err != nil {
        return fmt.Errorf("Failed to write the ZIP data to the stream, error: %v", err)
    }

    return nil
}

Using the ZipWriter, I load a ZIP file using the FromFile function and then write it to a byte array, using the WriteToStream function. After that, I call the following function to upload the ZIP archive data to a presigned URL in S3:

// DoRequest does an HTTP request against an endpoint with a given URL, method and access token
func DoRequest(client *http.Client, method string, url string, code string, reader io.Reader) ([]byte, error) {

    // First, create the request with the method, URL, body and access token
    // We don't expect this to fail so ignore the error
    request, _ := http.NewRequest(method, url, reader)
    if !util.IsEmpty(code) {
        request.Header.Set(headers.Accept, echo.MIMEApplicationJSON)
        request.Header.Set(headers.Authorization, fmt.Sprintf("Bearer %s", code))
    } else {
        request.Header.Set(headers.ContentType, "application/zip")
    }

    // Next, do the request; if this fails then return an error
    resp, err := client.Do(request)
    if err != nil {
        return nil, fmt.Errorf("Failed to run the %s request against %s, error: %v", method, url, err)
    } else if resp.StatusCode != http.StatusOK {
        return nil, fmt.Errorf("Failed to run the %s request against %s, response: %v", method, url, resp)
    }

    // Now, read the body from the response; if this fails then return an error
    defer resp.Body.Close()
    body, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        return nil, fmt.Errorf("Failed to read the body associated with the response, error: %v", err)
    }

    // Finally, return the body from the response
    return body, nil
}

So, the whole operation works about like this:

file, err := os.Open(location)
if err != nil {
    log.Fatalf("Unable to open ZIP archive located in %s, error: %v", location, err)
}

writer, err := lutils.FromFile(file)
if err != nil {
    log.Fatalf("File located in %s could not be read as a ZIP archive, error: %v", location, err)
}

buffer := new(bytes.Buffer)
if err := writer.WriteToStream(buffer); err != nil {
    log.Fatalf("Failed to write data to the ZIP archive, error: %v", err)
}

if body, err := DoRequest(new(http.Client), http.MethodPut, url, "", buffer); err != nil {
    log.Fatalf("Failed to upload the data to S3, response: %s, error: %v", string(body), err)
}

The problem I'm having is that, although the upload to S3 succeeds, when the ZIP archive is downloaded and the data is extracted, no files are found. While investigating this issue, I've come up with a number of possible fail points:

  1. FromFile does not create the ZIP archive from the file, correctly; resulting in a corrupt archive file.
  2. WriteToStream corrupts the data when it writes the archive. This seems less likely as I've already tested this functionality with a bytes.Buffer as the reader. Unless an os.File produces a corrupt ZIP archive when the bytes.Buffer does not, I think this function probably works as expected.
  3. The data is corrupted when DoRequest writes it to S3. This seems unlikely as I've used this code for other data without issue. So, unless ZIP archives have a structure that needs to be treated differently from other file types, I don't see a problem here either.

After examining these possibilities in more depth, I think the issue might be in how I'm creating the ZIP writer from the archive file but I'm not sure what the problem is.


Solution

  • The issue here was a bit of a red herring. As @CeriseLimón pointed out, calling NewWriter and Close on an existing ZIP archive will necessarily result in an empty archive being added onto the end of the file. In my use case, the solution was to open the file and write it directly to the stream, rather than attempting to read it as a ZIP archive.

    file, err := os.Open(location)
    if err != nil {
        log.Fatalf("Unable to open ZIP archive located in %s, error: %v", location, err)
    }
    
    if body, err := DoRequest(new(http.Client), http.MethodPut, url, "", file); err != nil {
        log.Fatalf("Failed to upload the data to S3, response: %s, error: %v", string(body), err)
    }