Search code examples
orientdb

Finding the most common vertex connected to neighbors by a certain edge and using this, and edge information, to perform calculations


I want to figure out where a person might be from based on who they follow and get the country together with an approximate latitude and longitude. I've got two types of nodes: Users (containing a name and possibly lat and lng) and Countries (containing a name). I've also got two types of edges: Follow and LivesIn (containing lat and lng).

At the moment both the Account as well as the LivesIn edge contain the lat and lng because I'm not completely sure where it'd be better, but at the moment I'm leaning towards putting it in the edge.

Below is an example network with five users. Three of whom I know where they're from. Now I want to make an educated guess where Alice is from:

  1. Alice follows four users
  2. Two of these four users are from Germany, one from Belgium and one unknown
  3. We can assume that Alice is from Germany
  4. The average lat and lng for the german users are (51.165691+51.115691)/2 and (10.451526+10.481526)/2
  5. We can assume that Alice is somewhere around (51.140691; 10.466526)

Example network.

CREATE CLASS Account EXTENDS V
CREATE PROPERTY Account.name string
CREATE PROPERTY Account.lat double
CREATE PROPERTY Account.lng double

CREATE CLASS Country EXTENDS V
CREATE PROPERTY Country.countryname string

CREATE CLASS LivesIn EXTENDS E
CREATE PROPERTY LivesIn.lat double
CREATE PROPERTY LivesIn.lng double

CREATE CLASS Follows EXTENDS E

CREATE VERTEX Account SET name='Alice'
CREATE VERTEX Account SET name='Bob', lat=50.503887, lng=4.469936  /* Belgium */
CREATE VERTEX Account SET name='Carol', lat=51.165691, lng=10.451526  /* Germany */
CREATE VERTEX Account SET name='Eve', lat=51.115691, lng=10.481526  /* Germany */
CREATE VERTEX Account SET name='Dave'

CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Bob')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Carol')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Eve')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Alice') TO (SELECT FROM Account WHERE name='Dave')

CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Bob') TO (SELECT FROM Account WHERE name='Alice')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Carol') TO (SELECT FROM Account WHERE name='Alice')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Eve') TO (SELECT FROM Account WHERE name='Alice')
CREATE EDGE Follows FROM (SELECT FROM Account WHERE name='Dave') TO (SELECT FROM Account WHERE name='Alice')

CREATE VERTEX Country SET countryname='Belgium'
CREATE VERTEX Country SET countryname='Germany'

CREATE EDGE LivesIn FROM (SELECT FROM Account WHERE name='Bob') TO (SELECT FROM Country WHERE countryname='Belgium') SET lat=50.503887, lng=4.469936
CREATE EDGE LivesIn FROM (SELECT FROM Account WHERE name='Carol') TO (SELECT FROM Country WHERE countryname='Germany') SET lat=51.165691, lng=10.451526
CREATE EDGE LivesIn FROM (SELECT FROM Account WHERE name='Eve') TO (SELECT FROM Country WHERE countryname='Germany') SET lat=51.115691, lng=10.481526

My question is if there's an effective way to achieve this using specific sql commands in OrientDB or if it needs a new function.

I got some small things figured out like getting all outgoing Follows connections:

SELECT out("Follows") FROM Account WHERE name='Alice'

But I can't really manage to get all the LivesIn edges from there.

Alternatively I can create a new function in OrientDB as they also did here. Something like:

var gdb = orient.getGraphNoTx();
var v = gdb.command("sql", "select from Account where name='" + name + "'");
neighbours = v[0].getRecord().field("out_Follows").iterator();
var result = []
print('\n');
country_dict = {}
while(neighbours.hasNext()) {
    var neighbour = neighbours.next();
    var temp = neighbour.field("in").field("out_LivesIn");
    if(temp) {
        it = temp.iterator();
        print(it.next());
        // Count each country and keep track of sum of lat and lng so it can be divided
        // once all neighbours have been visited
    }
}

But that doesn't really use any (possibly efficient?) built in methods of sql. Considering a single person can possibly follow tens of thousands of other accounts.

Would anyone have a suggestion how I can solve this?


Solution

  • Try this query

    select countryname,eval('sum / _count') as average_lat,eval('sum2 / _count') as average_lng from 
    (select countryname,sum(_lat),sum(_lng),count(*) as _count from 
    (select outE("livesIn").lat as _lat,outE("livesIn").lng as _lng,out("livesIn").countryname as countryname from 
    (select expand(out("Follows")) from Account where name="Alice") unwind _lat,_lng,countryname) 
    group by countryname order by _count desc limit 1)