Search code examples
distributedbigintchapelparallelism-amdahl

improving bigint write to disk performance


I am working with really large bigint numbers and I need to write them to disk and read them back later because they won't all fit in memory at a time.

The current Chapel implementation first converts the bigint to a string and then writes that string to disk[1]. This is taking a really long time for large integers.

var outputFile = open("outputPath", iomode.cwr);
var writer = outputFile.writer();
writer.write(reallyLargeBigint);
writer.close();
outputFile.close();

Is there any way to use GMP's mpz_out_raw()/mpz_inp_raw()[2] or mpz_export()/mpz_import()[3] or another similar way to dump the bigint's bytes to disk directly without converting to string beforehand and then read the bytes back into a bigint object?

Would that also work for a bigint array?

How could such functionality be added to Chapel's standard library in case it's not possible in the current state?

[1] https://github.com/chapel-lang/chapel/blob/master/modules/standard/BigInteger.chpl#L346

[2] https://gmplib.org/manual/I_002fO-of-Integers.html

[3] https://gmplib.org/manual/Integer-Import-and-Export.html


Solution

  • The functions you mentioned aren't directly available in any Chapel modules, but you can write extern procs and extern types to access the GMP functions directly.

    First we need to be able to work with C files, so declare some procedures and types for them:

    extern type FILE;
    extern type FILEptr = c_ptr(FILE);
    extern proc fopen(filename: c_string, mode: c_string): FILEptr;
    extern proc fclose(fp: FILEptr);
    

    Then we can declare the GMP functions we need:

    extern proc mpz_out_raw(stream: FILEptr, const op: mpz_t): size_t;
    extern proc mpz_inp_raw(ref rop: mpz_t, stream: FILEptr): size_t;
    

    Now we can use them to write a bigint value:

    use BigInteger;
    var res: bigint;
    res.fac(100); // Compute 100!
    
    writeln("Writing the number: ", res);
    
    var f = fopen("gmp_outfile", "w");
    mpz_out_raw(f, res.mpz);
    fclose(f);
    

    And read it back in from the file:

    var readIt: bigint;
    
    f = fopen("gmp_outfile", "r");
    mpz_inp_raw(readIt.mpz, f);
    fclose(f);
    
    writeln("Read the number:", readIt);
    

    For arrays of bigint values just loop over them to write or read them:

    // initialize the array
    var A: [1..10] bigint;
    for i in 1..10 do
      A[i].fac(i);
    
    // write the array to a file
    f = fopen("gmp_outfile", "w");
    for i in 1..10 do
      mpz_out_raw(f, A[i].mpz);
    fclose(f);
    
    // read the array back in from the file
    var B: [1..10] bigint;
    f = fopen("gmp_outfile", "r");
    for i in 1..10 do
      mpz_inp_raw(B[i].mpz, f);
    fclose(f);