Search code examples
juliatext-parsing

Parse an array of strings


I have a 1D array of strings ( Array{String,1} ) which describe a matrix of Floats (see below). I need to parse this matrix. Any slick suggestions?

  • Julia 1.5
  • MacOS

Yes, I did read this in from a file. I don't want to read the whole thing in using CSV, because I want to maintain the option to read the entire file using memory I/O, which I don't think CSV has. Plus, I have some complex lines including strings and numbers, and strings and strings that I need to parse, which kind of rules out DelimitedFiles. The columns are separated by two spaces.

julia> lines[24+member_total:idx-1]
49-element Array{String,1}:
 "0.0000000E+00  0.0000000E+00  0.0000000E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  1.9987500E-01  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  1.1998650E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  2.1998550E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  3.1998450E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  4.1998350E+00  1.3308000E+01"
 ⋮
 "0.0000000E+00  0.0000000E+00  5.9699895E+01  1.4000000E-01"
 "0.0000000E+00  0.0000000E+00  6.0199890E+01  1.0100000E-01"
 "0.0000000E+00  0.0000000E+00  6.0699885E+01  6.2000000E-02"
 "0.0000000E+00  0.0000000E+00  6.1199880E+01  2.3000000E-02"
 "0.0000000E+00  0.0000000E+00  6.1500000E+01  0.0000000E+00"

Solution

  • I am strongly against reinventing the wheel and using custom-made parsers due to practivcal robustness of such solutions in production.

    If your file is in a single String use:

    using DelimitedFiles
    readdlm(IOBuffer(strs))
    

    If your file as a Vector of Strings use

    cat(readdlm.(IOBuffer.(strsa))...,dims=1)
    

    Finally, there is not conflict in using memory maps togehther with CSV:

    using Mmap
    
    s = open("d.txt") # d.txt contains your lines
                      # if you want to read & wrtie use "w+" option
     
    m = Mmap.mmap(s, Vector{UInt8}) # memory mapping of your file
    
    readdlm(IOBuffer(m))
    
    

    At the same time you can always set the stream to the beginning and read the data regardless the memory map:

    seek(s,0)
    readdlm(s)
    seek(s,0) # reset the stream