Search code examples
c#c++.netpdfghostscript

Do ghostscript convertion in-memory using dot net or any other language


Can I use ghostscript API to convert PDF to some other format without reading data from disk or writing results to disk? It has a big overhead!

I need something like this:

public static byte[][] ConvertPDF(byte[] pdfData)
{
 //// Returns an array of byte-array of pages data
}

Solution

  • Since there still isn't a correct answer here all these years later, I'll provide one.

    Ghostscipt performs its operations on disk. It doesn't use an input & output path merely to load the file into memory, perform operations, and write it back. It actually reads and writes parts of the file to disk as it goes (using multiple threads). While this IS slower, it also uses much less memory(bearing in mind that these files could potentially be quite large).

    Because the operations are performed on disk, there was not (at the time of this question) any way to pass in or retrieve a byte array/memory stream because to do so would be "dishonest"--it might imply that it was a "shortcut" to prevent disk IO when in fact it would not. Later, support was added to accept & return memory streams, but it's important to note that this support merely accepted the memory stream, wrote it to a temporary file, performed the operations, and then read it back to a new memory stream.

    If that still meets your needs (for example, if you want the inevitable IO to be handled by the library rather than your business logic), here are a couple links demonstrating how to go about it (your exact needs do change the mechanics).

    Image to pdf (memory stream to memory stream via rasterizer)

    Image to pdf (file to memory stream via processor)

    Pdf to image (memory stream to memory stream via rasterizer)

    Hopefully these will, collectively, provide enough information to solve this issue for others who, like me & OP, mostly found people saying it was impossible and that I shouldn't even be trying.