Search code examples
scalaioprepend

Prepend header record (or a string / a file) to large file in Scala / Java


What is the most efficient (or recommended) way to prepend a string or a file to another large file in Scala, preferably without using external libraries? The large file can be binary.

E.g.

if prepend string is: header_information|123.45|xyz\n

and large file is:

abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...

I would expect to get:

header_information|123.45|xyz
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
abcdefghijklmnopqrstuvwxyz0123456789
...

Solution

  • I come up with the following solution:

    1. Turn prepend string/file into InputStream
    2. Turn large file into InputStream
    3. "Combine" InputStreams together using java.io.SequenceInputStream
    4. Use java.nio.file.Files.copy to write to target file

      object FileAppender {
        def main(args: Array[String]): Unit = {
          val stringToPrepend = new ByteArrayInputStream("header_information|123.45|xyz\n".getBytes)
          val largeFile = new FileInputStream("big_file.dat")
          Files.copy(
            new SequenceInputStream(stringToPrepend, largeFile),
            Paths.get("output_file.dat"),
            StandardCopyOption.REPLACE_EXISTING
          )
        }
      }
      

    Tested on ~30GB file, took ~40 seconds on MacBookPro (3.3GHz/16GB).

    This approach can be used (if necessary) to combine multiple partitioned files created by e.g. Spark engine.