Search code examples
rmicrosoft-r

Microsoft R .xdf file


I got some question about the .xdf file:

  1. What is this exacly?
  2. How does this type of file work?
  3. How Microsoft R works with this typ of file?
  4. What are the advantages agains data.frames?

I'm really looking forward to your answers.

Greetings R123456789


Solution

    1. An XDF file is a compressed binary file format with user selectable levels of compression, some quick facts can be found here: https://support.microsoft.com/en-us/help/3104260/qa-what-is-the-.xdf-file-format XDF files come in two forms, Standalone and Composite. For Standalone XDF files, you will see a single file stored on disk with the .xdf extension. For Composite, the XDF file is represented by a directory, which contains metadata and data subdirectories. Also, for Composite, Metadata and Data files in there directories are split and individually compress as XDF part files.

    1. It is a proprietary implementation inside of Microsoft R Server, I can expand on this answer, but i would need to refine the question, "How does this type of file work?"

    1. An XDF file is stored on the disk and does not sit in memory. Microsoft R Server, with a call to RxXdfData() or rxImport(), will read the XDF file and decompress it, then insert it into memory as a Data Frame. Many Microsoft R "rx" functions can take a path to an XDF directly as a data source or sink, and will manage reading segments into memory as required.

    1. The advantages of using XDF as a Data Source/Sink is that you do not need to buffer the entire file into memory for Microsoft R Server to work with it. It allows for partial reads and writes, as well as other optimizations around disk space via compression. It will operate faster than reading/writing from flat files as Metadata is used to index the XDF. The disadvantages are primarily performance, Data in-memory (data.frames) will be faster to operate on than data on disk in all cases.

    Note: As with all files, the underlying operation system controls when a file is written from memory to disk. For the purpose of your question, the assumption can be made that the XDF file resides on disk as a standard file.