Search code examples
goiobigdata

What is the fastest way to remove a specific line from a big file?


What is the best way to remove a line (which contains a specific substring) from a file?

I have tried to load the whole file into a slice, modify that slice and then print the slice to a file, which worked good, but when I want to do this with big files (e.g. 50GB+) this wouldn't work because I don't have so much memory.

I think this would be possible with streams, but I didn't figure out how to read and write at the same time (because I have to search the line via a substring and then remove it). Is this even possible or do I have to read the whole file and safe the index? If so what is the best way of doing so?


Solution

  • This reads from standard input and writes to standard output. Note that I adapted it from code in the 2nd answer at reading file line by line in go (not tested).

    scanner := bufio.NewScanner(os.Stdin)
    for scanner.Scan() {
        line := scanner.Text()
        if line != "unwanted" {
            fmt.Println(line)
        }
    }
    if err := scanner.Err(); err != nil {
        log.Fatal(err)
    }