I wanted to get the index of each row with the number of duplicates as a tuple,
data=[1 2;2 3; 1 3;1 3]
unique_data=[findall(==(r),eachrow(data)) for r in unique(eachrow(data))]
unique_number=collect(zip(first.(unique_data), length.(unique_data).-1))
I am getting the right answer like
3-element Vector{Tuple{Int64, Int64}}:
(1, 0)
(2, 0)
(3, 1)
I wanted to modify the code such that, even if data changes to as following
data=[1 2;2 3; 1 3;3 1]
get the same results
as
3-element Vector{Tuple{Int64, Int64}}:
(1, 0)
(2, 0)
(3, 1)
What you refer as "index of each row" is a fragile thing. I would recommend you to use the contents of the row as indicator. The easiest way to do it is to sort the row before matching, so you can do:
julia> using StatsBase
julia> countmap(sort.(eachrow(data)))
Dict{Vector{Int64}, Int64} with 3 entries:
[2, 3] => 1
[1, 3] => 2
[1, 2] => 1
A more fancy way would be:
julia> using DataFrames
julia> df = DataFrame(original=collect(eachrow(data)))
4×1 DataFrame
Row │ original
│ SubArray…
─────┼───────────
1 │ [1, 2]
2 │ [2, 3]
3 │ [1, 3]
4 │ [3, 1]
julia> df.sorted = sort.(df.original)
4-element Vector{Vector{Int64}}:
[1, 2]
[2, 3]
[1, 3]
[1, 3]
julia> gdf = groupby(df, :sorted)
GroupedDataFrame with 3 groups based on key: sorted
First Group (1 row): sorted = [1, 2]
Row │ original sorted
│ SubArray… Array…
─────┼───────────────────
1 │ [1, 2] [1, 2]
⋮
Last Group (2 rows): sorted = [1, 3]
Row │ original sorted
│ SubArray… Array…
─────┼───────────────────
1 │ [1, 3] [1, 3]
2 │ [3, 1] [1, 3]
julia> [(rowid=first(sdf.original), rowlocs=parentindices(sdf)[1], entries=length(parentindices(sdf)[1])) for sdf in gdf]
3-element Vector{NamedTuple{(:rowid, :rowlocs, :duplicates), Tuple{SubArray{Int64, 1, Matrix{Int64}, Tuple{Int64, Base.Slice{Base.OneTo{Int6, true}, Vector{Int64}, Int64}}}:
(rowid = [1, 2], rowlocs = [1], entries = 1)
(rowid = [2, 3], rowlocs = [2], entries = 1)
(rowid = [1, 3], rowlocs = [3, 4], entries = 2)
where you get the refrence row data, all row numbers where a given row is found and number of duplicates.