Writing an array of strings to .bin format is done as follows
out = open("string_array.bin","w")
a = ["first string","second string","third string"]
write(out,a)
close(out)
But when it comes to reading back array a, things start to get tricky.
out = open("string_array.bin","r")
a = read(out)
close(out)
typeof(a) # returns Array{UInt8,1}
How does one convert the Array{UInt8,1} back to the original a array of type Array{String,1}?
It needs to also work when the array of strings has 300+ million elements, i.e. the solution has to be well performing.
So Bogumil is right, it is a bit hacky, but if you are keen to write and read to binary files, then here is an implementation for reading and writing Vector{String}
that works by converting each String
to Vector{UInt8}
, then writing each Vector{UInt8}
to file, using an initial Int64
for each Vector{UInt8}
to store its length. The file also starts with an extra Int64
that stores the length of the Vector{String}
. The read routines then use this information to pull it all back in and convert it back to Vector{String}
:
my_write(fid1::IOStream, x::Vector{UInt8}) = begin ; write(fid1, Int64(length(x))) ; write(fid1, x) ; end
my_write(fid1::IOStream, x::Vector{Vector{UInt8}}) = begin ; write(fid1, Int64(length(x))) ; [ my_write(fid1, y) for y in x ] ; end
my_read(fid1::IOStream, ::Type{Vector{UInt8}})::Vector{UInt8} = begin i = read(fid1, Int64) ; [ read(fid1, UInt8) for a = 1:i ] ; end
my_read(fid1::IOStream, ::Type{Vector{Vector{UInt8}}})::Vector{Vector{UInt8}} = begin i = read(fid1, Int64) ; [ my_read(fid1, Vector{UInt8}) for a = 1:i ] ; end
my_write(myfilepath::String, x::Vector{String}) = open(fid1 -> my_write(fid1, [ Vector{UInt8}(codeunits(y)) for y in x ]), myfilepath, "w")
function my_read(myfilepath::String, ::Type{Vector{String}})::Vector{String}
x = open(fid1 -> my_read(fid1, Vector{Vector{UInt8}}), myfilepath, "r")
return [ String(y) for y in x ]
end
I've probably included a little more type information than is necessary, but it might make things a bit more obvious to you. Also, sorry, I have a bad habit of doing this sort of thing with one-liners, but you can easily unpack it if necessary. Here's some test code (just adjust the filepath):
myfilepath = "/home/colin/Temp/test_file.bin"
x = ["abc", "de", "f", "", "ghij"]
my_write(myfilepath, x)
my_read(myfilepath, Vector{String})
Note, with a little bit of effort, this code can be made more general so that it will work for pretty much any Vector{Vector{T}}
as long as T
is writable. In fact, if you're really clever, it should be able to be generalized to any Vector{Vector{Vector{...{T}}}}
, as long as you can get the recursion right.