Search code examples
iofilestreamadagnat

Maximum size/offset of streamed File using Ada.Streams.Stream_IO.Read


I try to read specific blocks of Data (around 4096 bytes) from a (possibly) huge file.

Using Ada.Streams.Stream_IO.Read() with the GNAT compiler, what would be the maximum offset that I could use? That is, if I wanted to read the last 4 kilobytes of the file, using

type Block_Array is array (1..4096) of Positive;
...
Ada.Streams.Stream_IO.Read(File, Block_Array, Last, Offset);

how big could the Offset be (and therefore the file)?

Doing a bit of research, Offset seems in GNAT to be defined as 2 ** mod Standard'Address_Size [1], which would be 2^32 on a 32-bit machine. It is not absolutely clear to me if this refers to bits, bytes, kilobytes or even some obscure multiple.

Supposing that it means bytes, wouldn't that mean the biggest file I could handle would be 32 gigabytes ((2^32*8)/1024^3) large? If so, is there a way to make this larger?


Since it was suggested that I haven't checked the (language) reference manual, here is the research that lead me to the question in the first place:

In [2] the read procedure is defined as:

procedure Read (File : in  File_Type;
                Item : out Stream_Element_Array;
                Last : out Stream_Element_Offset;
                From : in  Positive_Count);

A little further up:

type    Count          is range 0 .. *implementation-defined*;
subtype Positive_Count is Count range 1 .. Count'Last;

As one can see, the actual range of Count is implementation-defined. Since I am using the GNAT compiler (see above), I checked [1]. This states that

The Standard I/O packages described in Annex A for [...] Ada.Stream_IO [...] are implemented using the C library streams facility; where [...] All input/output operations use fread/fwrite.

In the same documentation below

function fread
     (buffer : voids;
      size : size_t;
      count : size_t;
      stream : FILEs)

where

type size_t is mod 2 ** Standard'Address_Size;

Again, Standard'Address_Size would be 32 on a 32-bit machine (I've also checked prior to asking that this is the case on my computer). I am also still not sure after reading both the language reference manual AND the implementation documentation of GNAT if Stream_Element_Offset refers to bytes or something other.

But again, supposing that it means bytes, wouldn't that mean the biggest file I could handle would be 32 gigabytes ((2^32*8)/1024^3) large? If so, is there a way to make this larger?

[1]: The Implementation of Standard I/O - GNAT Reference Manual

[2]: Ada Reference Manual - A.12.1 The Package Streams.Stream_IO


Solution

  • On Mac OS X, with FSF GCC 5.1.0, there is

    procedure Read
      (File : File_Type;
       Item : out Stream_Element_Array;
       Last : out Stream_Element_Offset;
       From : Positive_Count);
    

    where

    type Count is new Stream_Element_Offset
      range 0 .. Stream_Element_Offset’Last;
    
    subtype Positive_Count is Count range 1 .. Count’Last;            --'
    --  Index into file, in stream elements
    

    and (in Ada.Streams)

    type Stream_Element_Offset is new Long_Long_Integer;
    

    which is 64 bits .. should be enough.

    However, as Alex points out, GNAT GPL 2014 has

    type Stream_Element_Offset is range
      -(2 ** (Standard'Address_Size - 1)) ..
      +(2 ** (Standard'Address_Size - 1)) - 1;
    

    which means that, on a 32-bit machine, you’re limited to 2 gigabyte files.

    The latest FSF GCC sources (as for 5.1.0 above) have been changed; we’ll have to wait until GNAT GPL 2015 to see which is definitive.

    As a further cause for concern, the GNAT GPL 2014 code for Ada.Streams.Stream_IO.Set_Position (an internal subprogram) is

    procedure Set_Position (File : File_Type) is
       use type System.CRTL.long;
       use type System.CRTL.ssize_t;
       R : int;
    begin
       if Standard'Address_Size = 64 then
          R := fseek64 (File.Stream,
                        System.CRTL.ssize_t (File.Index) - 1, SEEK_SET);
       else
          R := fseek (File.Stream,
                      System.CRTL.long (File.Index) - 1, SEEK_SET);
       end if;
    
       if R /= 0 then
          raise Use_Error;
       end if;
    end Set_Position;
    

    whereas the GCC 5.1.0 version (which has no alternative implementations) is

    procedure Set_Position (File : File_Type) is
       use type System.CRTL.int64;
       R : int;
    begin
       R := fseek64 (File.Stream, System.CRTL.int64 (File.Index) - 1, SEEK_SET);
    
       if R /= 0 then
          raise Use_Error;
       end if;
    end Set_Position;
    

    If your system has fseek64() - or possibly fseeko(), which takes an off_t rather than a long for the offset parameter - and friends (I think it must, looking at the code above) I would think it wouldn’t be too hard to write your own version of Ada.Streams.Stream_IO to always use the 64-bit functions. Probably easiest to call it My_Stream_IO and grit your teeth about compiler warnings on use of internal GNAT units rather than try to slot it into the Ada hierarchy.