Search code examples
ioerlangmemory-efficient

Efficient Reading of Input File


Currently for a task, I am working with input files which give Matrix related test cases (Matrix Multiplication) i.e., example of an input file ->

N M
1 3 5 ... 6 (M columns)
....
5 4 2 ... 1 (N rows)

I was using simple read() to access them till now, but this is not efficient for large files of size > 10^2. So I wanted to know is there some way to use processes to do this in parallel.
Also I was thinking of using multiple IO readers based on line, so then each process could read different segments of the file but couldn't find any helpful resources.

Thank you.

PS: Current code is using this:

io:fread(IoDev, "", "~d")

Solution

  • Did you consider to use re module? I did not make a performance test, but it may be efficient. In the following example I do not use the first "M N" line. So I did not put it in the matrix.txt file.

    matrix file:

    1 2 3 4 5 6 7 8 9
    11 12 13 14 15 16 17 18 19
    21 22 23 24 25 26 27 28 29
    31 32 33 34 35 36 37 38 39
    

    I made the conversion in the shell

    1> {ok,B} = file:read_file("matrix.txt"). % read the complete file and store it in a binary
    {ok,<<"1 2 3 4 5 6 7 8 9\r\n11 12 13 14 15 16 17 18 19\r\n21 22 23 24 25 26 27 28 29\r\n31 32 33 34 35 36 37 38 39">>}
    2> {ok,ML} = re:compile("[\r\n]+"). % to split the complete binary in a list a binary, one for each line
    {ok,{re_pattern,0,0,0,
                    <<69,82,67,80,105,0,0,0,0,0,0,0,1,8,0,0,255,255,255,255,
                      255,255,...>>}}
    3> {ok,MN} = re:compile("[ ]+"). % to split the line into binaries one for each integer
    {ok,{re_pattern,0,0,0,
                    <<69,82,67,80,73,0,0,0,0,0,0,0,17,0,0,0,255,255,255,255,
                      255,255,...>>}}
    4> % a function to split a line and convert each chunk into integer
    4> F = fun(Line) -> Nums = re:split(Line,MN), [binary_to_integer(N) || N <- Nums] end.
    #Fun<erl_eval.7.126501267>
    5> Lines = re:split(B,ML). % split the file into lines
    [<<"1 2 3 4 5 6 7 8 9">>,<<"11 12 13 14 15 16 17 18 19">>,
     <<"21 22 23 24 25 26 27 28 29">>,
     <<"31 32 33 34 35 36 37 38 39">>]
    6> lists:map(F,Lines). % map the function to each lines
    [[1,2,3,4,5,6,7,8,9],
     [11,12,13,14,15,16,17,18,19],
     [21,22,23,24,25,26,27,28,29],
     [31,32,33,34,35,36,37,38,39]]
    7> 
    

    if you want to check the matrix size, you can replace the last line with:

    [[NbRows,NbCols]|Matrix] = lists:map(F,Lines),
    case (length(Matrix) == NbRows) andalso 
          lists:foldl(fun(X,Acc) -> Acc andalso (length(X) == NbCols) end,true,Matrix) of
        true -> {ok,Matrix};
        _ -> {error_size,Matrix}
    end.