Search code examples
dockergoiowaitgroup

loading docker image fails


I am using golang, docker client to load a docker image in .tar format.

func loadImageFromTar(cli *client.Client, tarFilePath string) (string, error) {
    // Read tar file
    tarFile, err := os.Open(tarFilePath)
    if err != nil {
        return "", fmt.Errorf("failed to open tar file: %w", err)
    }
    defer tarFile.Close()

    // Create a pipe to stream data between tar reader and Docker client
    pr, pw := io.Pipe()

    // Set up a WaitGroup for synchronization
    var wg sync.WaitGroup
    wg.Add(2)

    // Load the Docker image in a separate goroutine
    var imageLoadResponse types.ImageLoadResponse
    go func() {
        defer wg.Done()
        imageLoadResponse, err = cli.ImageLoad(context.Background(), pr, false)
        if err != nil {
            err = fmt.Errorf("failed to load Docker image: %w", err)
        }
    }()

    // Read tar file metadata and copy the tar file to the pipe writer in a separate goroutine
    var repoTag string
    go func() {
        defer wg.Done()
        defer pw.Close()

        tarReader := tar.NewReader(tarFile)

        for {
            header, err := tarReader.Next()
            if err == io.EOF {
                break
            }
            if err != nil {
                err = fmt.Errorf("failed to read tar header: %w", err)
                fmt.Printf("Error: %v", err)
                return
            }

            // Extract the repository and tag from the manifest file
            if header.Name == "manifest.json" {
                data, err := io.ReadAll(tarReader)
                if err != nil {
                    err = fmt.Errorf("failed to read manifest file: %w", err)
                    fmt.Printf("Error: %v", err)
                    return
                }

                var manifest []map[string]interface{}
                err = json.Unmarshal(data, &manifest)
                if err != nil {
                    err = fmt.Errorf("failed to unmarshal manifest: %w", err)
                    fmt.Printf("Error: %v", err)
                    return
                }

                repoTag = manifest[0]["RepoTags"].([]interface{})[0].(string)
            }

            // Copy the tar file data to the pipe writer
            _, err = io.Copy(pw, tarReader)
            if err != nil {
                err = fmt.Errorf("failed to copy tar data: %w", err)
                fmt.Printf("Error: %v", err)
                return
            }
        }
    }()

    // Wait for both goroutines to finish
    wg.Wait()

    // Check if any error occurred in the goroutines
    if err != nil {
        return "", err
    }

    // Close the image load response body
    defer imageLoadResponse.Body.Close()

    // Get the image ID
    imageID, err := getImageIDByRepoTag(cli, repoTag)
    if err != nil {
        return "", fmt.Errorf("failed to get image ID: %w", err)
    }

    return imageID, nil
}

// func: getImageIDByRepoTag

func getImageIDByRepoTag(cli *client.Client, repoTag string) (string, error) {
    images, err := cli.ImageList(context.Background(), types.ImageListOptions{})
    if err != nil {
        return "", fmt.Errorf("failed to list images: %w", err)
    }

    for _, image := range images {
        for _, tag := range image.RepoTags {
            if tag == repoTag {
                return image.ID, nil
            }
        }
    }

    return "", fmt.Errorf("image ID not found for repo tag: %s", repoTag)
}

The getImageIDByRepoTag always returns fmt.Errorf("image ID not found for repo tag: %s", repoTag). Also when I run docker images I do not see the image being loaded. Looks like the image load is not completing .

In my other code, The docker images load usually takes time although the docker client cli.ImageLoad returns immediately. I usually add some 30 seconds wait time before checking for getImageIDByRepoTag. Adding a wait time also did not help in this case.

Thanks


Solution

  • There's a couple of issues:

    • the two goroutines share err so some error handling may get lost
      • you should use unique error variables here for each goroutine & inspect both errors after the wg.Wait()
    • the main issue: you are reading from the tar reader to find the manifest file and extract the tag info - which is fine - but upon finding this, you copy the rest of the byte stream to your pipe. You are thus losing a chunk of the byte stream which never reaches the docker client

    To avoid reading the tar byte stream twice, you could use io.TeeReader. This allows you to read the tar archive - to scan for the manifest file - but also have this stream written in full elsewhere (i.e. to the docker client).

    Create the TeeReader:

    tr := io.TeeReader(tarFile, pw)  // reading `tr` will read the tarFile - but simultaneously write to `pw`
    

    the image load will now read from this (instead of the pipe):

    //imageLoadResponse, err = cli.ImageLoad(context.Background(), pr, false)
    imageLoadResponse, err = cli.ImageLoad(context.Background(), tr, false)
    

    then change your archive/tar reader to read from the pipe:

    //tarReader := tar.NewReader(tarFile) // direct from file
    tarReader := tar.NewReader(pr) // read from pipe (indirectly from the file)
    

    you can then drop the your io.Copy block:

    // no longer needed:
    //
    // _, err = io.Copy(pw, tarReader)
    //
    

    since the tar-inspection code will read the entire stream to EOF.

    P.S. you may want to reset the io.EOF to nil to avoid thinking a EOF is a more critical error, when you inspect any potential errors from either of the goroutines:

    header, err = tarReader.Next()
    if err == io.EOF {
        err = nil  //  EOF is a non-fatal error here
        break
    }