Search code examples
matrixjuliadelete-row

Deleting rows of matrix based on duplicated values of a column in Julia


Here is another challenge in Julia. Imagine when have the following matrix:

 5.0  3.54924
 6.0  3.54702
 6.0  3.54655
 7.0  3.54168
 7.0  3.0

I want to delete the rows of the matrix for which the elements of the first column are duplicated. That will produce the following matrix, for example:

 5.0  3.54924
 6.0  3.54702
 7.0  3.0

Deletion of rows are arbitrary since we don't keep repeated values for the first column. How can I achieve that?


Solution

  • Also you can simply do it using DataFrames.jl (the performance will be worse as you perform the conversion twice, but the code is simpler):

    julia> A = [5.0  3.54924
                       6.0  3.54702
                       6.0  3.54655
                       7.0  3.54168
                       7.0  3.0]
    5×2 Array{Float64,2}:
     5.0  3.54924
     6.0  3.54702
     6.0  3.54655
     7.0  3.54168
     7.0  3.0
    
    julia> Matrix(unique(DataFrame(A), 1))
    3×2 Array{Float64,2}:
     5.0  3.54924
     6.0  3.54702
     7.0  3.54168
    

    Alternatively you could write

    julia> A[.!nonunique(DataFrame(A[:,1:1])),:]
    3×2 Array{Float64,2}:
     5.0  3.54924
     6.0  3.54702
     7.0  3.54168
    

    which is a bit faster and uses less memory but more messy.