Search code examples
juliaexpand

Expanding Vector List Each by List of Instances - Julia


I would like to expand a vector of values each by a vector containing the number of each instance. I have come up with the following code which does the job but it seems like this is a common use so I am probably missing something.

valuelist = ["a","b","d","z"]
numberofinstance = [3,5,1,11]

valuevector = String[]
for i in 1:length(numberofinstance) 
  append!(valuevector , repeat([valuelist[i]], numberofinstance[i])) 
end

Solution

  • If you are fine with using a package (basically a stdlib), the function you are looking for is called inverse_rle in StatsBase.jl:

    julia> using StatsBase
    
    julia> inverse_rle(valuelist, numberofinstance)
    20-element Array{String,1}:
     "a"
     "a"
     "a"
     "b"
     "b"
     "b"
     "b"
     "b"
     "d"
     "z"
     "z"
     "z"
     "z"
     "z"
     "z"
     "z"
     "z"
     "z"
     "z"
     "z"
    
    julia> @btime inverse_rle($valuelist, $numberofinstance);
      76.799 ns (1 allocation: 240 bytes)
    
    julia> @btime yoursolution($valuelist, $numberofinstance);
      693.329 ns (13 allocations: 1.55 KiB)
    

    If you want to avoid packages, you could, in principle, broadcast repeat or ^ (powering) like so,

    vcat(collect.(.^(valuelist, numberofinstance))...)

    but I'd argue that this is relatively hard to parse and also slower than inverse_rle,

    julia> @btime yoursolution($valuelist, $numberofinstance);
      693.329 ns (13 allocations: 1.55 KiB)
    
    julia> @btime vcat(collect.(.^($valuelist, $numberofinstance))...)
      472.615 ns (9 allocations: 800 bytes)
    

    However, since Julia allows you to write fast loops you can easily define your own simple function. The following is much faster than your solution (as fast as the implementation in StatsBase):

    function multiply(vs, ns)
       r = Vector{String}(undef, sum(ns))
       c = 1
       @inbounds for i in axes(ns, 1)
           for k in 1:ns[i]
               r[c] = vs[i]
               c += 1
           end
       end
       r
    end
    

    Benchmark:

    julia> @btime yoursolution($valuelist, $numberofinstance);
      693.329 ns (13 allocations: 1.55 KiB)
    
    julia> @btime multiply($valuelist, $numberofinstance);
      76.469 ns (1 allocation: 240 bytes)