Search code examples
arraysjuliadifference

Difference between elements of two arrays by common values of one column in Julia


Here is a question related to a previous question of mine, which I prefer to submit as a new question. Suppose this time we have only the following 2 arrays in Julia:

[5.0  3.5
 6.0  3.6
 7.0  3.0]

and

[5.0  4.5
 6.0  4.7
 8.0  3.0]

I want to obtain an array that calculates the difference between elements of the second column (the first array minus the second array, by this order) but only for common values of the first column. The resulting array must then be the following:

[5.0  -1
 6.0  -1.1]

How can we code in Julia for obtaining this last array?


Solution

  • Assume:

    x = [5.0  3.5
         6.0  3.6
         7.0  3.0]
    y = [5.0  4.5
         6.0  4.7
         8.0  3.0]
    

    Again there are many ways to do it. Using DataFrames you can write:

    using DataFrames
    df = innerjoin(DataFrame(x, [:id, :x]), DataFrame(y, [:id, :y]), on=:id)
    df = [df.id df.x-df.y]
    ## 2×2 Matrix{Float64}:
    ##  5.0  -1.0
    ##  6.0  -1.1
    

    You could also convert original arrays to dictionaries and work with them:

    dx = Dict(x[i,1] => x[i,2] for i in 1:size(x, 1))
    dy = Dict(y[i,1] => y[i,2] for i in 1:size(y, 1))
    ks = sort!(collect(intersect(keys(dx), keys(dy))))
    [ks [dx[k]-dy[k] for k in ks]]
    ## 2×2 Matrix{Float64}:
    ##  5.0  -1.0
    ##  6.0  -1.1
    

    The difference between those two methods is how they would handle duplicates in either x or y in the first column. The first will produce all combinations, the second will store only last value for each key.