I'm attempting to make a Julia function that can be provided a URL from Matrix Market or from SuiteSparse Matrix Collection - for simplicity assume I only want to consider URLs ending in ".mtx.gz".
Below is the code I currently have:
using HTTP
using CodecZlib
using SparseArrays
url = "https://math.nist.gov/pub/MatrixMarket2/Harwell-Boeing/airtfc/zenios.mtx.gz" # String
data = HTTP.get(url).body # Vector{UInt8} (alias for Array{UInt8, 1})
buffer = IOBuffer(data) # IOBuffer (alias for Base.GenericIOBuffer{Array{UInt8, 1}})
stream = GzipDecompressorStream(buffer) # TranscodingStreams.TranscodingStream{GzipDecompressor, IOBuffer}
forUse = read(stream) # Vector{UInt8} (alias for Array{UInt8, 1})
# Alternative for String
# forUse = read(stream, String) # String
The issue I'm coming across is I only know how to force the variable forUse
to be either a Vector{UInt8}
data type or a String
.
If I attempt sparse(forUse)
when forUse
is a Vector{UInt8}
, I get a sparse array of nonsense.
If I attempt sparse(forUse)
when forUse
is a String
, I get the error: "ERROR: MethodError: no method matching sparse(::String)"
If possible, I wish to avoid reading the String
data type of forUse
line by line as I feel this would be wildly inefficient.
Question: Is there any efficient way to extract the sparse array from HTTP.get(url).body
without saving the file to storage? (see MMGet for the version that downloads the file onto storage)
Not sure this is the exact thing asked, but:
# forUse defined in the question
fuse = String(forUse)
let I = Int[], J = Int[], V = Float64[]
for line in eachsplit(fuse, '\n')
length(line) == 0 && continue
line[1] == '%' && continue
(si,sj,sv) = split(line)
i = parse(Int, si)
j = parse(Int, sj)
v = parse(Float64, sv)
push!(I, i)
push!(J, j)
push!(V, v)
end
sparse(I,J,V)
end
returns a sparse Matrix with entries in the input datafile.
It seems unlikely this needs to be super-optimized beyond this, and this code enjoys a certain clarity.