Search code examples
juliabuffer

Julia Vector{UInt8} to SparseArrays.SparseMatrixCSC{Any64, Int64}


I'm attempting to make a Julia function that can be provided a URL from Matrix Market or from SuiteSparse Matrix Collection - for simplicity assume I only want to consider URLs ending in ".mtx.gz".

Below is the code I currently have:

using HTTP
using CodecZlib
using SparseArrays

url = "https://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/airtfc/zenios.mtx.gz" # String

data = HTTP.get(url).body # Vector{UInt8} (alias for Array{UInt8, 1})

buffer = IOBuffer(data) # IOBuffer (alias for Base.GenericIOBuffer{Array{UInt8, 1}})

stream = GzipDecompressorStream(buffer) # TranscodingStreams.TranscodingStream{GzipDecompressor, IOBuffer}

forUse = read(stream) # Vector{UInt8} (alias for Array{UInt8, 1})
# Alternative for String
# forUse = read(stream, String) # String

The issue I'm coming across is I only know how to force the variable forUse to be either a Vector{UInt8} data type or a String.

If I attempt sparse(forUse) when forUse is a Vector{UInt8}, I get a sparse array of nonsense. If I attempt sparse(forUse) when forUse is a String, I get the error: "ERROR: MethodError: no method matching sparse(::String)"

If possible, I wish to avoid reading the String data type of forUse line by line as I feel this would be wildly inefficient.

Question: Is there any efficient way to extract the sparse array from HTTP.get(url).body without saving the file to storage? (see MMGet for the version that downloads the file onto storage)


Solution

  • Not sure this is the exact thing asked, but:

    # forUse defined in the question
    fuse = String(forUse)
    let I = Int[], J = Int[], V = Float64[]
        for line in eachsplit(fuse, '\n')
            length(line) == 0 && continue
            line[1] == '%' && continue
            (si,sj,sv) = split(line)
            i = parse(Int, si)
            j = parse(Int, sj)
            v = parse(Float64, sv)
            push!(I, i)
            push!(J, j)
            push!(V, v)
        end
        sparse(I,J,V)
    end
    

    returns a sparse Matrix with entries in the input datafile.

    It seems unlikely this needs to be super-optimized beyond this, and this code enjoys a certain clarity.