Search code examples
hadoopgonamed-pipesfuse

Golang Virtual File


I have a closed soruce application that takes a file as an input, calculates its hash and does some other stuff I have no control. Modifiying source or reverse engineering is not feasible.

The program is designed to work with regular files, however I need to supply a very large file from HDFS. Copying the file will take too much time and space on disk. So I was thinking of using FUSE but I did not find a good solution. I tried using a named pipe as follows:

func readFile(namenode, path string, pipe *os.File) {
    client, err := hdfs.New(namenode)
    log.Println(err, client)

    hdfsFile, err := client.Open(path)
    if err != nil {
        log.Fatal(err)
    }
    log.Println(hdfsFile)

    // written, err := io.Copy(pipe, hdfsFile)
    bytes := make([]byte, 4096)
    for {
        read, err := hdfsFile.Read(bytes)
        log.Println(read, err)
        if err != nil {
            break
        }
        written, err := pipe.Write(bytes)
        log.Println(written, err)
    }
    err = pipe.Close()
    log.Println(err)
}

I know the above code is not complete, the test file is 10MB, however after reading 8 times 4096 bytes named pipe buffer gets full and the other program takes it all and closes the pipe.

But after a while the other program that is reading the pipe closes the pipe and I get broken pipe error. Is there any possibility of creating a virtual file other than fuse and pipe?


Solution

  • I think you actually had the right idea with FUSE. Without sources for your upstream application, it's hard to say what file semantics it's trying to use (though some time with strace might help to illuminate what's going on. Maybe...).

    In any case, I'd have a look at the Go-FUSE project, specifically the hello.go example, which shows exactly how to handle the single file case very well.