I have two tables connected with habtm relation (through a table).
Table1
id : integer
name: string
Table2
id : integer
name: string
Table3
id : integer
table1_id: integer
table2_id: integer
I need to group Table1 records by simmilar records from Table2. Example:
userx = Table1.create()
user1.table2_ids = 3, 14, 15
user2.table2_ids = 3, 14, 15, 16
user3.table2_ids = 3, 14, 16
user4.table2_ids = 2, 5, 7
user5.table2_ids = 3, 5
Result of grouping that I want is something like
=> [ [ [1,2], [3, 14, 15] ], [ [2,3], [3,14, 16] ], [ [ 1, 2, 3, 5], [3] ] ]
Where first array is an user ids second is table2_ids. I there any possible SQL solution or I need to create some kind of algorithm ?
Updated: Ok, I have a code that is working like I've said. Maybe someone who can help me will find it useful to understand my idea.
def self.compare
hash = {}
Table1.find_each do |table_record|
Table1.find_each do |another_table_record|
if table_record != another_table_record
results = table_record.table2_ids & another_table_record.table2_ids
hash["#{table_record.id}_#{another_table_record.id}"] = results if !results.empty?
end
end
end
#hash = hash.delete_if{|k,v| v.empty?}
hash.sort_by{|k,v| v.count}.to_h
end
But I can bet that you can imagine how long does it takes to show me an output. For my 500 Table1 records it's something near 1-2 minutes. If I will have more, time will be increased in progression, so I need some elegant solution or SQL query.
Table1.find_each do |table_record|
Table1.find_each do |another_table_record|
...
Above codes have performance issue that you have to query database N*N times, which could be optimized down to one single query.
# Query table3, constructing the data useful to us
# { table1_id: [table2_ids], ... }
records = Table3.all.group_by { |t| t.table1_id }.map { |t1_id, t3_records|
[t1_id, t3_records.map(&:table2_id)]
}.to_h
Then you could do exactly the same thing to records
to get the final result hash.
UPDATE:
@AKovtunov You miss understood me. My code is the first step. With records
, which have {t1_id: t2_ids}
hash, you could do sth like this:
hash = {}
records.each do |t1_id, t2_ids|
records.each do |tt1_id, tt2_ids|
if t1_id != tt1_id
inter = t2_ids & tt2_ids
hash["#{t1_id}_#{tt1_id}"] = inter if !inter.empty?
end
end
end