Search code examples
pythonarraysmatrixfile-iojulia

How to (idiomatically) read indexed arrays from a delimited text file?


I have text files from an external source that are formatted like so:

       0       0    -0.105961      0.00000
       1       0     -1.06965      0.00000
       1       1   -0.0187213    -0.240237
       2       0    -0.124695      0.00000
       2       1    -0.178982    0.0633255
       2       2     0.760988    -0.213796
       3       0     -1.96695      0.00000
       3       1    0.0721285    0.0491248
       3       2    -0.560517     0.267733
       3       3    -0.188732    -0.112053
       4       0   -0.0205364      0.00000
       ⋮        ⋮        ⋮              ⋮
      40      30     0.226833    -0.733674
      40      31    0.0444837    -0.249677
      40      32    -0.171559    -0.970601
      40      33    -0.141848    -0.137257
      40      34    -0.247042    -0.902128
      40      35    -0.495114     0.322912
      40      36     0.132215    0.0543294
      40      37     0.125682     0.817945
      40      38     0.181098     0.223309
      40      39     0.702915     0.103991
      40      40      1.11882    -0.488252

where the first two columns are the indices of a 2d array (say i and j), and the 3rd and 4th columns are the values for two 2d arrays (say p[:,:] and q[:,:]). What would be the idiomatic way in Python/Julia to read this into two 2d arrays?

There are some assumptions we can make: the arrays are lower triangular (i.e., the values only exist (or are nonzero) for j <= i), and the indices are increasing, that is, the last line can be expected to have the largest i (and probably j).

The current implementation assumes that maximum i and j is 40, and proceeds like so: (Julia minimal working example):

using OffsetArrays
using DelimitedFiles: readdlm

n = 40

p = zeros(0:n, 0:n)
q = zeros(0:n, 0:n)

open(filename) do infile
    for i = 0:n
        for j = 0:i
            line = readline(infile)
            arr = readdlm(IOBuffer(line))
            p[i,j] = arr[3]
            q[i,j] = arr[4]
        end
    end
end

Note that this example also assumes that the index j changes the fastest, which is usually true, but in doing that it effectively ignores the first two columns of the data.

However, I'm looking for a solution that makes fewer assumptions about the data file (maybe only the lower-triangular one and the index increasing one). What would be a "natural" way of doing this in Julia or Python?


Solution

  • The code in the question is almost there. Here is a Julian way to do this:

    using DelimitedFiles, OffsetArrays
    
    rawdata = readdlm("in.txt");
    irange = (:)(Int.(extrema(rawdata[:,1]))...)
    jrange = (:)(Int.(extrema(rawdata[:,2]))...)
    
    p = OffsetArray(zeros(length.((irange,jrange))),irange,jrange)
    q = copy(p)
    
    foreach(eachrow(rawdata)) do (i,j,pd,qd)
        p[Int(i),Int(j)] = pd
        q[Int(i),Int(j)] = qd
    end
    

    This isn't the simplest/prettiest code, but data parsing rarely is.

    After the foldl, the matrices p and q should hold the correct values (and can be cast to a TriangularMatrix type if desired).

    The initial bit with irange and jrange can be even shortened:

    rngs = Tuple(splat(:).(extrema(Int.(rawdata[:,1:2]);dims=1)))
    p = OffsetArray(zeros(length.(rngs)),rngs...)
    q = copy(p)